Vivian Xia

MSDS458 Research Assignment 02

The CIFAR-10 dataset (Canadian Institute For Advanced Research) is a collection of images that are commonly used to train machine learning and computer vision algorithms. It is one of the most widely used datasets for machine learning research. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.

The CIFAR-10 dataset
https://www.cs.toronto.edu/~kriz/cifar.html

Import packages needed

In [1]:
import os
import time
import numpy as np
import pandas as pd
import seaborn as sns
from packaging import version

from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import train_test_split
from sklearn.manifold import TSNE

import matplotlib.pyplot as plt
import matplotlib as mpl

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import models, layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPool2D, BatchNormalization, Dropout, Flatten, Input, Dense
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.preprocessing import image
from tensorflow.keras.utils import to_categorical
In [2]:
%matplotlib inline
np.set_printoptions(precision=3, suppress=True)

Verify TensorFlow Version and Keras Version

In [3]:
print("This notebook requires TensorFlow 2.0 or above")
print("TensorFlow version: ", tf.__version__)
assert version.parse(tf.__version__).release[0] >=2
This notebook requires TensorFlow 2.0 or above
TensorFlow version:  2.7.0
In [4]:
print("Keras version: ", keras.__version__)
Keras version:  2.7.0

Mount Google Drive to Colab Environment

In [5]:
from google.colab import drive
drive.mount('/content/drive')

os.chdir('/content/drive/My Drive/Colab Notebooks/MSDS458/Assignment 2/Models/')
Mounted at /content/drive

Functions for Research Assignment

In [6]:
def plot_confusion_matrix(conf_mx):
    fig, ax = plt.subplots(figsize=(8,8))
    sns.heatmap(conf_mx, annot=True, fmt='.2f', cbar=False, ax=ax, cmap=plt.cm.gray)
    plt.ylabel('true label')
    plt.xlabel('predicted label')

Loading cifar10 Dataset

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

In [7]:
(train_images, train_labels), (test_images, test_labels) = keras.datasets.cifar10.load_data()
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 3s 0us/step
170508288/170498071 [==============================] - 3s 0us/step
  • Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test).
  • x_train, x_test: uint8 arrays of color image data with shapes (num_samples, 32, 32).
  • y_train, y_test: uint8 arrays of digit labels (integers in range 0-9)

EDA Training and Test Datasets

  • Imported 50000 examples for training and 10000 examples for test
  • Imported 50000 labels for training and 10000 labels for test
In [8]:
print('train_images:\t{}'.format(train_images.shape))
print('train_labels:\t{}'.format(train_labels.shape))
print('test_images:\t\t{}'.format(test_images.shape))
print('test_labels:\t\t{}'.format(test_labels.shape))
train_images:	(50000, 32, 32, 3)
train_labels:	(50000, 1)
test_images:		(10000, 32, 32, 3)
test_labels:		(10000, 1)

Review labels for training dataset

In [9]:
print("First ten labels training dataset:\n {}\n".format(train_labels[0:10]))
print("This output the numeric label, need to convert to item description")
First ten labels training dataset:
 [[6]
 [9]
 [9]
 [4]
 [1]
 [1]
 [2]
 [7]
 [8]
 [3]]

This output the numeric label, need to convert to item description

Plot Examples

In [10]:
def get_three_classes(x, y):
    def indices_of(class_id):
        indices, _ = np.where(y == float(class_id))
        return indices

    indices = np.concatenate([indices_of(0), indices_of(1), indices_of(2)], axis=0)
    
    x = x[indices]
    y = y[indices]
    
    count = x.shape[0]
    indices = np.random.choice(range(count), count, replace=False)
    
    x = x[indices]
    y = y[indices]
    
    y = tf.keras.utils.to_categorical(y)
    
    return x, y
In [11]:
x_preview, y_preview = get_three_classes(train_images, train_labels)
x_preview, y_preview = get_three_classes(test_images, test_labels)
In [12]:
class_names_preview = ['aeroplane', 'car', 'bird']

def show_random_examples(x, y, p):
    indices = np.random.choice(range(x.shape[0]), 10, replace=False)
    
    x = x[indices]
    y = y[indices]
    p = p[indices]
    
    plt.figure(figsize=(10, 5))
    for i in range(10):
        plt.subplot(2, 5, i + 1)
        plt.imshow(x[i])
        plt.xticks([])
        plt.yticks([])
        col = 'green' if np.argmax(y[i]) == np.argmax(p[i]) else 'red'
        plt.xlabel(class_names_preview[np.argmax(p[i])], color=col)
    plt.show()

show_random_examples(x_preview, y_preview, y_preview)

Preprocessing Data for Model Development

The labels are an array of integers, ranging from 0 to 9. These correspond to the class of clothing the image represents:

Label Class_
0 airplane
1 automobile
2 bird
3 cat
4 deer
5 dog
6 frog
7 horse
8 ship
9 truck
In [13]:
class_names = ['airplane'
,'automobile'
,'bird'
,'cat'
,'deer'
,'dog'
,'frog' 
,'horse'
,'ship'
,'truck']

Create Validation Data Set

In [14]:
train_images_split, valid_images_split, train_labels_split, valid_labels_split = train_test_split(train_images
                                                            ,train_labels,test_size=.1,random_state=42,shuffle=True)

Confirm Datasets {Train, Validation, Test}

In [15]:
print(train_images_split.shape, valid_images_split.shape, test_images.shape)
(45000, 32, 32, 3) (5000, 32, 32, 3) (10000, 32, 32, 3)

Rescale Images {Train, Validation, Test}

The images are 32x32 NumPy arrays, with pixel values ranging from 0 to 255.

  1. Each element in each example is a pixel value
  2. Pixel values range from 0 to 255
  3. 0 = black
  4. 255 = white
In [16]:
train_images_norm = train_images_split/255
valid_images_norm = valid_images_split/255
test_images_norm = test_images/255

Experiment 1

DNN with 2 layers (no regularization)

Create the Model

Build DNN Model

In [32]:
model = models.Sequential()
model.add(layers.Flatten(input_shape=(32, 32, 3)))
model.add(layers.Dense(units=108, activation=tf.nn.relu))
model.add(layers.Dense(units=200, activation=tf.nn.relu))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [18]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten (Flatten)           (None, 3072)              0         
                                                                 
 dense (Dense)               (None, 108)               331884    
                                                                 
 dense_1 (Dense)             (None, 200)               21800     
                                                                 
 output_layer (Dense)        (None, 10)                2010      
                                                                 
=================================================================
Total params: 355,694
Trainable params: 355,694
Non-trainable params: 0
_________________________________________________________________
In [19]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[19]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [20]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

tf.keras.callbacks.ModelCheckpoint
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint
In [21]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=30
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                    ,callbacks=[
                    tf.keras.callbacks.ModelCheckpoint('/content/drive/My Drive/Colab Notebooks/MSDS458/Assignment 2/Models/model_{val_accuracy:.4f}.h5', save_best_only=True,
                                        save_weights_only=False, monitor='val_accuracy')] 
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/30
90/90 [==============================] - 3s 10ms/step - loss: 1.9626 - accuracy: 0.2890 - val_loss: 1.8403 - val_accuracy: 0.3468
Epoch 2/30
90/90 [==============================] - 1s 7ms/step - loss: 1.7709 - accuracy: 0.3692 - val_loss: 1.7345 - val_accuracy: 0.3806
Epoch 3/30
90/90 [==============================] - 1s 7ms/step - loss: 1.6898 - accuracy: 0.3982 - val_loss: 1.6875 - val_accuracy: 0.4000
Epoch 4/30
90/90 [==============================] - 1s 7ms/step - loss: 1.6428 - accuracy: 0.4149 - val_loss: 1.6557 - val_accuracy: 0.4014
Epoch 5/30
90/90 [==============================] - 1s 7ms/step - loss: 1.5904 - accuracy: 0.4356 - val_loss: 1.6160 - val_accuracy: 0.4214
Epoch 6/30
90/90 [==============================] - 1s 7ms/step - loss: 1.5609 - accuracy: 0.4495 - val_loss: 1.5862 - val_accuracy: 0.4372
Epoch 7/30
90/90 [==============================] - 1s 6ms/step - loss: 1.5257 - accuracy: 0.4605 - val_loss: 1.5740 - val_accuracy: 0.4338
Epoch 8/30
90/90 [==============================] - 1s 7ms/step - loss: 1.5100 - accuracy: 0.4676 - val_loss: 1.5497 - val_accuracy: 0.4446
Epoch 9/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4773 - accuracy: 0.4803 - val_loss: 1.5183 - val_accuracy: 0.4512
Epoch 10/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4620 - accuracy: 0.4832 - val_loss: 1.5244 - val_accuracy: 0.4524
Epoch 11/30
90/90 [==============================] - 1s 9ms/step - loss: 1.4514 - accuracy: 0.4873 - val_loss: 1.5091 - val_accuracy: 0.4562
Epoch 12/30
90/90 [==============================] - 1s 11ms/step - loss: 1.4336 - accuracy: 0.4954 - val_loss: 1.4963 - val_accuracy: 0.4638
Epoch 13/30
90/90 [==============================] - 1s 10ms/step - loss: 1.4143 - accuracy: 0.5006 - val_loss: 1.4776 - val_accuracy: 0.4684
Epoch 14/30
90/90 [==============================] - 1s 9ms/step - loss: 1.3994 - accuracy: 0.5048 - val_loss: 1.4959 - val_accuracy: 0.4576
Epoch 15/30
90/90 [==============================] - 1s 10ms/step - loss: 1.3969 - accuracy: 0.5057 - val_loss: 1.5061 - val_accuracy: 0.4698
Epoch 16/30
90/90 [==============================] - 1s 11ms/step - loss: 1.3820 - accuracy: 0.5127 - val_loss: 1.4931 - val_accuracy: 0.4676
Epoch 17/30
90/90 [==============================] - 1s 11ms/step - loss: 1.3757 - accuracy: 0.5147 - val_loss: 1.4634 - val_accuracy: 0.4778
Epoch 18/30
90/90 [==============================] - 1s 11ms/step - loss: 1.3603 - accuracy: 0.5209 - val_loss: 1.4655 - val_accuracy: 0.4790
Epoch 19/30
90/90 [==============================] - 1s 9ms/step - loss: 1.3466 - accuracy: 0.5257 - val_loss: 1.4608 - val_accuracy: 0.4742
Epoch 20/30
90/90 [==============================] - 1s 10ms/step - loss: 1.3400 - accuracy: 0.5278 - val_loss: 1.4809 - val_accuracy: 0.4720
Epoch 21/30
90/90 [==============================] - 1s 9ms/step - loss: 1.3302 - accuracy: 0.5316 - val_loss: 1.4540 - val_accuracy: 0.4764
Epoch 22/30
90/90 [==============================] - 1s 12ms/step - loss: 1.3229 - accuracy: 0.5335 - val_loss: 1.4382 - val_accuracy: 0.4882
Epoch 23/30
90/90 [==============================] - 1s 10ms/step - loss: 1.3091 - accuracy: 0.5381 - val_loss: 1.4478 - val_accuracy: 0.4808
Epoch 24/30
90/90 [==============================] - 1s 9ms/step - loss: 1.3023 - accuracy: 0.5400 - val_loss: 1.4258 - val_accuracy: 0.4848
Epoch 25/30
90/90 [==============================] - 1s 10ms/step - loss: 1.2965 - accuracy: 0.5419 - val_loss: 1.4512 - val_accuracy: 0.4858
Epoch 26/30
90/90 [==============================] - 1s 10ms/step - loss: 1.2826 - accuracy: 0.5470 - val_loss: 1.4596 - val_accuracy: 0.4820
Epoch 27/30
90/90 [==============================] - 1s 10ms/step - loss: 1.2842 - accuracy: 0.5451 - val_loss: 1.4533 - val_accuracy: 0.4866
Epoch 28/30
90/90 [==============================] - 1s 11ms/step - loss: 1.2759 - accuracy: 0.5490 - val_loss: 1.4329 - val_accuracy: 0.4898
Epoch 29/30
90/90 [==============================] - 1s 10ms/step - loss: 1.2673 - accuracy: 0.5547 - val_loss: 1.4349 - val_accuracy: 0.4822
Epoch 30/30
90/90 [==============================] - 1s 13ms/step - loss: 1.2544 - accuracy: 0.5594 - val_loss: 1.4166 - val_accuracy: 0.4970
Total time:  28.207395315170288 seconds

Evaluate the model

In order to ensure that this is not a simple "memorization" by the machine, evaluate the performance on the test set.

In [22]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 2s 6ms/step - loss: 1.3799 - accuracy: 0.5114
test set accuracy:  0.5113999843597412

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [23]:
history_dict = history.history
history_dict.keys()
Out[23]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [24]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [25]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [26]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [27]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [28]:
plot_confusion_matrix(norm_conf_mx)

Predictions

Load HDF5 Model Format

tf.keras.models.load_model
https://www.tensorflow.org/api_docs/python/tf/keras/models/load_model
In [29]:
model = tf.keras.models.load_model('/content/drive/My Drive/Colab Notebooks/MSDS458/Assignment 2/Models/model_0.4970.h5')
In [31]:
preds = model.predict(test_images_norm)
preds.shape
Out[31]:
(10000, 10)

Visualize predictions

In [32]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [33]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[33]:
  airplane automobile bird cat deer dog frog horse ship truck
0 0.92% 2.41% 1.02% 49.68% 13.94% 17.52% 2.38% 0.57% 11.27% 0.30%
1 2.67% 14.88% 0.04% 0.08% 0.07% 0.01% 0.01% 0.03% 46.80% 35.40%
2 26.46% 11.74% 0.21% 0.17% 0.12% 0.10% 0.01% 0.56% 42.89% 17.76%
3 24.06% 1.15% 14.56% 3.70% 12.42% 0.85% 0.06% 10.77% 31.92% 0.53%
4 0.03% 0.01% 6.90% 0.94% 77.21% 1.49% 13.30% 0.07% 0.05% 0.00%
5 1.27% 0.24% 2.43% 29.24% 1.91% 11.39% 48.65% 3.54% 0.20% 1.13%
6 12.25% 32.22% 6.09% 24.86% 0.04% 17.37% 1.85% 2.51% 1.38% 1.43%
7 1.73% 0.25% 8.18% 3.24% 3.26% 0.52% 82.24% 0.25% 0.26% 0.08%
8 3.65% 0.05% 45.62% 9.57% 24.11% 8.36% 1.11% 7.24% 0.28% 0.02%
9 0.53% 87.43% 0.95% 0.37% 0.12% 0.11% 0.01% 0.26% 0.80% 9.42%
10 32.59% 0.13% 6.56% 5.16% 3.19% 2.18% 2.78% 0.54% 46.81% 0.04%
11 0.04% 23.90% 0.01% 0.14% 0.01% 0.01% 0.01% 0.17% 3.65% 72.05%
12 0.59% 0.81% 11.60% 21.76% 5.84% 10.25% 45.11% 3.39% 0.33% 0.34%
13 7.38% 0.06% 0.30% 0.03% 0.05% 0.11% 0.02% 91.96% 0.03% 0.06%
14 0.87% 37.37% 1.74% 0.91% 0.04% 1.19% 0.13% 1.44% 0.07% 56.25%
15 4.39% 0.82% 1.04% 8.37% 6.08% 16.54% 2.48% 0.61% 58.68% 1.01%
16 0.24% 2.28% 0.68% 12.91% 0.11% 20.92% 0.27% 61.16% 0.38% 1.06%
17 8.96% 0.10% 10.25% 17.80% 21.86% 10.11% 2.41% 17.61% 2.25% 8.66%
18 1.32% 1.60% 0.01% 0.01% 0.08% 0.00% 0.00% 0.03% 96.05% 0.90%
19 0.12% 0.13% 1.53% 5.76% 1.02% 1.88% 60.79% 28.25% 0.02% 0.51%

Plot TSNE plot

In [34]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-3]
output_layer_activations = activations[-1]
In [35]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.001s...
[t-SNE] Computed neighbors for 5000 samples in 1.386s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 3.151020
[t-SNE] KL divergence after 250 iterations with early exaggeration: 82.975967
[t-SNE] KL divergence after 300 iterations: 2.933155
In [36]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()

Experiment 2

DNN with 3 layers (no regularization)

Create the Model

Build DNN Model

In [37]:
model = models.Sequential()
model.add(layers.Flatten(input_shape=(32, 32, 3)))
model.add(layers.Dense(units=108, activation=tf.nn.relu))
model.add(layers.Dense(units=200, activation=tf.nn.relu))
model.add(layers.Dense(units=282, activation=tf.nn.relu))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [38]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_1 (Flatten)         (None, 3072)              0         
                                                                 
 dense_2 (Dense)             (None, 108)               331884    
                                                                 
 dense_3 (Dense)             (None, 200)               21800     
                                                                 
 dense_4 (Dense)             (None, 282)               56682     
                                                                 
 output_layer (Dense)        (None, 10)                2830      
                                                                 
=================================================================
Total params: 413,196
Trainable params: 413,196
Non-trainable params: 0
_________________________________________________________________
In [39]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[39]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [40]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

In [41]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=30
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/30
90/90 [==============================] - 1s 9ms/step - loss: 1.9474 - accuracy: 0.2896 - val_loss: 1.7923 - val_accuracy: 0.3358
Epoch 2/30
90/90 [==============================] - 1s 7ms/step - loss: 1.7231 - accuracy: 0.3810 - val_loss: 1.7125 - val_accuracy: 0.3872
Epoch 3/30
90/90 [==============================] - 1s 7ms/step - loss: 1.6288 - accuracy: 0.4189 - val_loss: 1.6509 - val_accuracy: 0.4104
Epoch 4/30
90/90 [==============================] - 1s 7ms/step - loss: 1.5760 - accuracy: 0.4364 - val_loss: 1.5846 - val_accuracy: 0.4300
Epoch 5/30
90/90 [==============================] - 1s 7ms/step - loss: 1.5241 - accuracy: 0.4580 - val_loss: 1.5682 - val_accuracy: 0.4372
Epoch 6/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4947 - accuracy: 0.4691 - val_loss: 1.5314 - val_accuracy: 0.4492
Epoch 7/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4552 - accuracy: 0.4837 - val_loss: 1.5216 - val_accuracy: 0.4500
Epoch 8/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4492 - accuracy: 0.4826 - val_loss: 1.5148 - val_accuracy: 0.4580
Epoch 9/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4159 - accuracy: 0.4955 - val_loss: 1.4922 - val_accuracy: 0.4622
Epoch 10/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3999 - accuracy: 0.5002 - val_loss: 1.5355 - val_accuracy: 0.4538
Epoch 11/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3787 - accuracy: 0.5071 - val_loss: 1.4760 - val_accuracy: 0.4654
Epoch 12/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3603 - accuracy: 0.5150 - val_loss: 1.4736 - val_accuracy: 0.4760
Epoch 13/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3420 - accuracy: 0.5241 - val_loss: 1.4886 - val_accuracy: 0.4696
Epoch 14/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3351 - accuracy: 0.5258 - val_loss: 1.4527 - val_accuracy: 0.4782
Epoch 15/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3081 - accuracy: 0.5344 - val_loss: 1.4349 - val_accuracy: 0.4824
Epoch 16/30
90/90 [==============================] - 1s 7ms/step - loss: 1.2928 - accuracy: 0.5396 - val_loss: 1.4428 - val_accuracy: 0.4774
Epoch 17/30
90/90 [==============================] - 1s 7ms/step - loss: 1.2786 - accuracy: 0.5462 - val_loss: 1.4790 - val_accuracy: 0.4768
Epoch 18/30
90/90 [==============================] - 1s 6ms/step - loss: 1.2630 - accuracy: 0.5527 - val_loss: 1.4379 - val_accuracy: 0.4860
Epoch 19/30
90/90 [==============================] - 1s 6ms/step - loss: 1.2452 - accuracy: 0.5588 - val_loss: 1.4290 - val_accuracy: 0.4868
Epoch 20/30
90/90 [==============================] - 1s 6ms/step - loss: 1.2374 - accuracy: 0.5617 - val_loss: 1.4590 - val_accuracy: 0.4908
Epoch 21/30
90/90 [==============================] - 0s 6ms/step - loss: 1.2250 - accuracy: 0.5658 - val_loss: 1.4420 - val_accuracy: 0.4920
Epoch 22/30
90/90 [==============================] - 0s 6ms/step - loss: 1.2035 - accuracy: 0.5726 - val_loss: 1.4211 - val_accuracy: 0.4982
Epoch 23/30
90/90 [==============================] - 1s 6ms/step - loss: 1.1985 - accuracy: 0.5735 - val_loss: 1.4353 - val_accuracy: 0.4972
Epoch 24/30
90/90 [==============================] - 1s 6ms/step - loss: 1.1726 - accuracy: 0.5824 - val_loss: 1.4226 - val_accuracy: 0.4934
Epoch 25/30
90/90 [==============================] - 0s 6ms/step - loss: 1.1725 - accuracy: 0.5822 - val_loss: 1.4313 - val_accuracy: 0.4936
Epoch 26/30
90/90 [==============================] - 1s 6ms/step - loss: 1.1505 - accuracy: 0.5913 - val_loss: 1.4257 - val_accuracy: 0.4988
Epoch 27/30
90/90 [==============================] - 1s 6ms/step - loss: 1.1429 - accuracy: 0.5939 - val_loss: 1.4104 - val_accuracy: 0.5066
Epoch 28/30
90/90 [==============================] - 1s 7ms/step - loss: 1.1240 - accuracy: 0.6015 - val_loss: 1.4226 - val_accuracy: 0.4980
Epoch 29/30
90/90 [==============================] - 1s 6ms/step - loss: 1.1094 - accuracy: 0.6060 - val_loss: 1.4127 - val_accuracy: 0.5088
Epoch 30/30
90/90 [==============================] - 1s 6ms/step - loss: 1.1071 - accuracy: 0.6058 - val_loss: 1.4364 - val_accuracy: 0.5000
Total time:  19.19623064994812 seconds

Evaluate the model

In [42]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 1s 3ms/step - loss: 1.3994 - accuracy: 0.5066
test set accuracy:  0.506600022315979

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [43]:
history_dict = history.history
history_dict.keys()
Out[43]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [44]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [45]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [46]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [47]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [48]:
plot_confusion_matrix(norm_conf_mx)

Visualize predictions

In [49]:
preds = model.predict(test_images_norm)
preds.shape
Out[49]:
(10000, 10)
In [50]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [51]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[51]:
  airplane automobile bird cat deer dog frog horse ship truck
0 0.70% 2.97% 1.85% 54.15% 17.66% 5.89% 0.18% 0.14% 13.23% 3.24%
1 0.15% 9.94% 0.04% 0.03% 0.01% 0.01% 0.00% 0.03% 23.29% 66.50%
2 8.37% 51.14% 0.27% 0.04% 0.08% 0.01% 0.00% 0.30% 37.74% 2.04%
3 7.29% 8.85% 30.75% 6.10% 12.54% 1.81% 0.04% 16.33% 13.05% 3.22%
4 0.05% 0.00% 1.67% 0.55% 86.39% 1.09% 10.17% 0.06% 0.01% 0.00%
5 0.67% 0.10% 1.33% 17.75% 0.89% 7.96% 68.46% 2.69% 0.05% 0.10%
6 13.48% 26.56% 0.42% 26.41% 0.17% 3.88% 4.65% 1.51% 20.97% 1.96%
7 0.13% 1.61% 4.16% 2.57% 0.95% 0.64% 88.65% 0.02% 0.02% 1.26%
8 1.75% 0.05% 32.97% 21.25% 12.22% 24.17% 0.75% 6.53% 0.04% 0.26%
9 0.33% 50.97% 0.48% 0.28% 0.02% 0.35% 0.04% 0.17% 2.23% 45.12%
10 55.17% 0.13% 5.31% 4.16% 2.89% 4.09% 2.02% 0.36% 25.74% 0.13%
11 0.01% 16.42% 0.03% 0.06% 0.00% 0.01% 0.04% 0.02% 2.96% 80.45%
12 5.03% 2.98% 26.99% 16.72% 8.51% 16.14% 17.15% 1.26% 3.88% 1.35%
13 15.04% 7.19% 4.42% 1.15% 7.58% 3.16% 0.19% 60.44% 0.76% 0.07%
14 0.66% 31.29% 6.19% 1.04% 0.03% 0.37% 0.64% 0.52% 0.11% 59.15%
15 6.21% 0.01% 0.94% 7.38% 6.27% 6.07% 0.62% 0.09% 72.38% 0.03%
16 0.44% 0.68% 0.42% 28.45% 0.05% 51.91% 0.35% 3.14% 0.12% 14.44%
17 2.73% 1.66% 9.89% 18.00% 16.73% 8.34% 2.85% 8.64% 1.12% 30.04%
18 1.39% 1.63% 0.05% 0.14% 0.13% 0.02% 0.01% 0.02% 94.64% 1.98%
19 0.03% 0.02% 1.80% 7.24% 0.93% 2.18% 86.72% 0.99% 0.00% 0.09%

Plot TSNE plot

In [52]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-4]
output_layer_activations = activations[-1]
In [53]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.001s...
[t-SNE] Computed neighbors for 5000 samples in 0.650s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 3.090226
[t-SNE] KL divergence after 250 iterations with early exaggeration: 83.743637
[t-SNE] KL divergence after 300 iterations: 2.916745
In [54]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()

Experiment 3

CNN with 2 convolution/max pooling layers (no regularization)

Create the Model

Build CNN Model

In [55]:
model = models.Sequential()

model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu,input_shape=(32, 32, 3))) 
model.add(layers.MaxPool2D((2, 2),strides=2))

model.add(layers.Conv2D(filters=108, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D(pool_size=(2, 2),strides=2))

model.add(layers.Flatten())
model.add(layers.Dense(units=210, activation=tf.nn.relu))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [56]:
model.summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 30, 30, 64)        1792      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 15, 15, 64)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 13, 13, 108)       62316     
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 6, 6, 108)        0         
 2D)                                                             
                                                                 
 flatten_2 (Flatten)         (None, 3888)              0         
                                                                 
 dense_5 (Dense)             (None, 210)               816690    
                                                                 
 output_layer (Dense)        (None, 10)                2110      
                                                                 
=================================================================
Total params: 882,908
Trainable params: 882,908
Non-trainable params: 0
_________________________________________________________________
In [57]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[57]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [58]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

In [59]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=30
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/30
90/90 [==============================] - 10s 27ms/step - loss: 1.7183 - accuracy: 0.3798 - val_loss: 1.4817 - val_accuracy: 0.4746
Epoch 2/30
90/90 [==============================] - 2s 24ms/step - loss: 1.3625 - accuracy: 0.5171 - val_loss: 1.2913 - val_accuracy: 0.5390
Epoch 3/30
90/90 [==============================] - 2s 24ms/step - loss: 1.2227 - accuracy: 0.5702 - val_loss: 1.2314 - val_accuracy: 0.5656
Epoch 4/30
90/90 [==============================] - 2s 24ms/step - loss: 1.1300 - accuracy: 0.6058 - val_loss: 1.1198 - val_accuracy: 0.6054
Epoch 5/30
90/90 [==============================] - 2s 24ms/step - loss: 1.0657 - accuracy: 0.6270 - val_loss: 1.0840 - val_accuracy: 0.6206
Epoch 6/30
90/90 [==============================] - 2s 24ms/step - loss: 1.0062 - accuracy: 0.6492 - val_loss: 1.0592 - val_accuracy: 0.6248
Epoch 7/30
90/90 [==============================] - 2s 24ms/step - loss: 0.9639 - accuracy: 0.6669 - val_loss: 0.9939 - val_accuracy: 0.6466
Epoch 8/30
90/90 [==============================] - 2s 24ms/step - loss: 0.9048 - accuracy: 0.6895 - val_loss: 0.9520 - val_accuracy: 0.6662
Epoch 9/30
90/90 [==============================] - 2s 24ms/step - loss: 0.8589 - accuracy: 0.7037 - val_loss: 0.9623 - val_accuracy: 0.6584
Epoch 10/30
90/90 [==============================] - 2s 23ms/step - loss: 0.8212 - accuracy: 0.7180 - val_loss: 0.9144 - val_accuracy: 0.6782
Epoch 11/30
90/90 [==============================] - 3s 28ms/step - loss: 0.7839 - accuracy: 0.7303 - val_loss: 0.9128 - val_accuracy: 0.6778
Epoch 12/30
90/90 [==============================] - 2s 26ms/step - loss: 0.7645 - accuracy: 0.7362 - val_loss: 0.9177 - val_accuracy: 0.6788
Epoch 13/30
90/90 [==============================] - 3s 28ms/step - loss: 0.7238 - accuracy: 0.7529 - val_loss: 0.8718 - val_accuracy: 0.6928
Epoch 14/30
90/90 [==============================] - 3s 33ms/step - loss: 0.6775 - accuracy: 0.7682 - val_loss: 0.8685 - val_accuracy: 0.6998
Epoch 15/30
90/90 [==============================] - 3s 30ms/step - loss: 0.6444 - accuracy: 0.7799 - val_loss: 0.9059 - val_accuracy: 0.6902
Epoch 16/30
90/90 [==============================] - 3s 30ms/step - loss: 0.6170 - accuracy: 0.7885 - val_loss: 0.8979 - val_accuracy: 0.6934
Epoch 17/30
90/90 [==============================] - 3s 35ms/step - loss: 0.5842 - accuracy: 0.8024 - val_loss: 0.8840 - val_accuracy: 0.7028
Epoch 18/30
90/90 [==============================] - 2s 25ms/step - loss: 0.5610 - accuracy: 0.8071 - val_loss: 0.9044 - val_accuracy: 0.6904
Epoch 19/30
90/90 [==============================] - 2s 24ms/step - loss: 0.5270 - accuracy: 0.8212 - val_loss: 0.8734 - val_accuracy: 0.7036
Epoch 20/30
90/90 [==============================] - 2s 24ms/step - loss: 0.4959 - accuracy: 0.8345 - val_loss: 0.8959 - val_accuracy: 0.7068
Epoch 21/30
90/90 [==============================] - 2s 24ms/step - loss: 0.4784 - accuracy: 0.8377 - val_loss: 0.9044 - val_accuracy: 0.7044
Epoch 22/30
90/90 [==============================] - 2s 24ms/step - loss: 0.4300 - accuracy: 0.8571 - val_loss: 0.9075 - val_accuracy: 0.7070
Epoch 23/30
90/90 [==============================] - 2s 24ms/step - loss: 0.4043 - accuracy: 0.8643 - val_loss: 0.9539 - val_accuracy: 0.7004
Epoch 24/30
90/90 [==============================] - 2s 24ms/step - loss: 0.3814 - accuracy: 0.8744 - val_loss: 0.9381 - val_accuracy: 0.7104
Epoch 25/30
90/90 [==============================] - 2s 24ms/step - loss: 0.3413 - accuracy: 0.8892 - val_loss: 0.9360 - val_accuracy: 0.7138
Epoch 26/30
90/90 [==============================] - 2s 24ms/step - loss: 0.3220 - accuracy: 0.8946 - val_loss: 0.9864 - val_accuracy: 0.7006
Epoch 27/30
90/90 [==============================] - 2s 24ms/step - loss: 0.2953 - accuracy: 0.9051 - val_loss: 1.0269 - val_accuracy: 0.7022
Epoch 28/30
90/90 [==============================] - 2s 24ms/step - loss: 0.2809 - accuracy: 0.9092 - val_loss: 1.0054 - val_accuracy: 0.7068
Epoch 29/30
90/90 [==============================] - 2s 24ms/step - loss: 0.2364 - accuracy: 0.9274 - val_loss: 1.0645 - val_accuracy: 0.7110
Epoch 30/30
90/90 [==============================] - 2s 24ms/step - loss: 0.2201 - accuracy: 0.9318 - val_loss: 1.1110 - val_accuracy: 0.7084
Total time:  77.74068093299866 seconds

Evaluate the model

In [60]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 1s 4ms/step - loss: 1.1319 - accuracy: 0.7033
test set accuracy:  0.7032999992370605

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [61]:
history_dict = history.history
history_dict.keys()
Out[61]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [62]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [63]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [64]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [65]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [66]:
plot_confusion_matrix(norm_conf_mx)

Visualize predictions

In [67]:
preds = model.predict(test_images_norm)
preds.shape
Out[67]:
(10000, 10)
In [68]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [69]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[69]:
  airplane automobile bird cat deer dog frog horse ship truck
0 0.00% 0.00% 0.02% 99.65% 0.00% 0.00% 0.32% 0.00% 0.00% 0.00%
1 0.29% 26.43% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 73.27% 0.00%
2 0.94% 81.86% 0.00% 0.08% 0.18% 0.00% 0.00% 0.66% 10.98% 5.29%
3 98.28% 1.36% 0.06% 0.00% 0.09% 0.00% 0.00% 0.00% 0.17% 0.04%
4 0.00% 0.00% 0.18% 3.31% 10.45% 0.00% 86.06% 0.00% 0.00% 0.00%
5 0.00% 0.00% 0.00% 0.25% 0.00% 0.12% 99.60% 0.02% 0.00% 0.01%
6 0.00% 99.35% 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.64%
7 1.10% 0.00% 2.76% 0.14% 6.25% 0.14% 89.58% 0.01% 0.00% 0.01%
8 0.00% 0.00% 0.00% 98.96% 0.63% 0.35% 0.02% 0.03% 0.00% 0.00%
9 0.00% 99.51% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.49%
10 71.39% 0.01% 1.88% 0.04% 23.47% 2.10% 0.49% 0.07% 0.53% 0.01%
11 0.00% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.99%
12 0.00% 0.01% 26.43% 3.42% 0.17% 69.20% 0.72% 0.04% 0.00% 0.00%
13 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% 0.00%
14 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
15 0.00% 0.03% 0.00% 0.01% 0.01% 0.00% 34.33% 0.00% 65.62% 0.00%
16 0.00% 0.00% 0.00% 0.00% 0.00% 99.96% 0.00% 0.03% 0.00% 0.00%
17 0.05% 0.00% 0.08% 3.80% 0.19% 0.36% 0.43% 94.79% 0.00% 0.31%
18 0.08% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.77% 0.14%
19 0.00% 0.00% 0.00% 0.01% 0.04% 0.00% 99.94% 0.00% 0.00% 0.00%

Plot feature map

In [70]:
(_,_), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

img = test_images[2004]
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)

class_names = ['airplane'
,'automobile'
,'bird'
,'cat'
,'deer'
,'dog'
,'frog' 
,'horse'
,'ship'
,'truck']

plt.imshow(img, cmap='viridis')
plt.axis('off')
plt.show()
In [71]:
# Extracts the outputs of the top 8 layers:
layer_outputs = [layer.output for layer in model.layers[:8]]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
In [72]:
activations = activation_model.predict(img_tensor)
len(activations)
Out[72]:
7
In [73]:
layer_names = []
for layer in model.layers:
    layer_names.append(layer.name)
    
layer_names
Out[73]:
['conv2d',
 'max_pooling2d',
 'conv2d_1',
 'max_pooling2d_1',
 'flatten_2',
 'dense_5',
 'output_layer']
In [74]:
# These are the names of the layers, so can have them as part of our plot
layer_names = []
for layer in model.layers[:4]:
    layer_names.append(layer.name)

images_per_row = 16

# Now let's display our feature maps
for layer_name, layer_activation in zip(layer_names, activations):
    # This is the number of features in the feature map
    n_features = layer_activation.shape[-1]

    # The feature map has shape (1, size, size, n_features)
    size = layer_activation.shape[1]

    # We will tile the activation channels in this matrix
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size))

    # We'll tile each filter into this big horizontal grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0,
                                             :, :,
                                             col * images_per_row + row]
            # Post-process the feature to make it visually palatable
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size,
                         row * size : (row + 1) * size] = channel_image

    # Display the grid
    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1],
                        scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')
    
plt.show();
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:28: RuntimeWarning: invalid value encountered in true_divide

Plot TSNE plot

In [75]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-2]
output_layer_activations = activations[-1]
In [76]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.001s...
[t-SNE] Computed neighbors for 5000 samples in 1.016s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 4.432542
[t-SNE] KL divergence after 250 iterations with early exaggeration: 81.325729
[t-SNE] KL divergence after 300 iterations: 2.875303
In [77]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()

Experiment 4

CNN with 3 convolution/max pooling layers (no regularization)

Create the Model

Build CNN Model

In [78]:
model = models.Sequential()

model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu,input_shape=(32, 32, 3))) 
model.add(layers.MaxPool2D((2, 2),strides=2))

model.add(layers.Conv2D(filters=108, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D(pool_size=(2, 2),strides=2))

model.add(layers.Conv2D(filters=180, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D(pool_size=(2, 2),strides=2))

model.add(layers.Flatten())
model.add(layers.Dense(units=210, activation=tf.nn.relu))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [79]:
model.summary()
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_2 (Conv2D)           (None, 30, 30, 64)        1792      
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 15, 15, 64)       0         
 2D)                                                             
                                                                 
 conv2d_3 (Conv2D)           (None, 13, 13, 108)       62316     
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 6, 6, 108)        0         
 2D)                                                             
                                                                 
 conv2d_4 (Conv2D)           (None, 4, 4, 180)         175140    
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 2, 2, 180)        0         
 2D)                                                             
                                                                 
 flatten_3 (Flatten)         (None, 720)               0         
                                                                 
 dense_6 (Dense)             (None, 210)               151410    
                                                                 
 output_layer (Dense)        (None, 10)                2110      
                                                                 
=================================================================
Total params: 392,768
Trainable params: 392,768
Non-trainable params: 0
_________________________________________________________________
In [80]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[80]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [81]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

In [82]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=30
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/30
90/90 [==============================] - 3s 30ms/step - loss: 1.8360 - accuracy: 0.3301 - val_loss: 1.5425 - val_accuracy: 0.4400
Epoch 2/30
90/90 [==============================] - 2s 27ms/step - loss: 1.4617 - accuracy: 0.4724 - val_loss: 1.3545 - val_accuracy: 0.5178
Epoch 3/30
90/90 [==============================] - 2s 27ms/step - loss: 1.2996 - accuracy: 0.5355 - val_loss: 1.2411 - val_accuracy: 0.5546
Epoch 4/30
90/90 [==============================] - 2s 26ms/step - loss: 1.1961 - accuracy: 0.5773 - val_loss: 1.2051 - val_accuracy: 0.5722
Epoch 5/30
90/90 [==============================] - 2s 27ms/step - loss: 1.1145 - accuracy: 0.6134 - val_loss: 1.0866 - val_accuracy: 0.6126
Epoch 6/30
90/90 [==============================] - 2s 27ms/step - loss: 1.0394 - accuracy: 0.6394 - val_loss: 1.0479 - val_accuracy: 0.6284
Epoch 7/30
90/90 [==============================] - 2s 27ms/step - loss: 0.9886 - accuracy: 0.6578 - val_loss: 1.0262 - val_accuracy: 0.6324
Epoch 8/30
90/90 [==============================] - 2s 27ms/step - loss: 0.9482 - accuracy: 0.6732 - val_loss: 0.9769 - val_accuracy: 0.6554
Epoch 9/30
90/90 [==============================] - 2s 27ms/step - loss: 0.8960 - accuracy: 0.6925 - val_loss: 1.0047 - val_accuracy: 0.6470
Epoch 10/30
90/90 [==============================] - 2s 27ms/step - loss: 0.8434 - accuracy: 0.7102 - val_loss: 0.9013 - val_accuracy: 0.6846
Epoch 11/30
90/90 [==============================] - 2s 28ms/step - loss: 0.8130 - accuracy: 0.7202 - val_loss: 0.9051 - val_accuracy: 0.6876
Epoch 12/30
90/90 [==============================] - 2s 27ms/step - loss: 0.7687 - accuracy: 0.7353 - val_loss: 0.8632 - val_accuracy: 0.7070
Epoch 13/30
90/90 [==============================] - 2s 27ms/step - loss: 0.7463 - accuracy: 0.7424 - val_loss: 0.8669 - val_accuracy: 0.6956
Epoch 14/30
90/90 [==============================] - 3s 32ms/step - loss: 0.7076 - accuracy: 0.7569 - val_loss: 0.8967 - val_accuracy: 0.6984
Epoch 15/30
90/90 [==============================] - 2s 27ms/step - loss: 0.6820 - accuracy: 0.7670 - val_loss: 0.8930 - val_accuracy: 0.6942
Epoch 16/30
90/90 [==============================] - 2s 27ms/step - loss: 0.6596 - accuracy: 0.7733 - val_loss: 0.8579 - val_accuracy: 0.7114
Epoch 17/30
90/90 [==============================] - 3s 32ms/step - loss: 0.6271 - accuracy: 0.7840 - val_loss: 0.8392 - val_accuracy: 0.7140
Epoch 18/30
90/90 [==============================] - 2s 27ms/step - loss: 0.5902 - accuracy: 0.7962 - val_loss: 0.8518 - val_accuracy: 0.7144
Epoch 19/30
90/90 [==============================] - 2s 27ms/step - loss: 0.5581 - accuracy: 0.8088 - val_loss: 0.8258 - val_accuracy: 0.7216
Epoch 20/30
90/90 [==============================] - 2s 27ms/step - loss: 0.5345 - accuracy: 0.8176 - val_loss: 0.8465 - val_accuracy: 0.7194
Epoch 21/30
90/90 [==============================] - 3s 28ms/step - loss: 0.5032 - accuracy: 0.8289 - val_loss: 0.8414 - val_accuracy: 0.7248
Epoch 22/30
90/90 [==============================] - 2s 27ms/step - loss: 0.4765 - accuracy: 0.8370 - val_loss: 0.8345 - val_accuracy: 0.7288
Epoch 23/30
90/90 [==============================] - 2s 27ms/step - loss: 0.4663 - accuracy: 0.8392 - val_loss: 0.8306 - val_accuracy: 0.7318
Epoch 24/30
90/90 [==============================] - 2s 27ms/step - loss: 0.4285 - accuracy: 0.8558 - val_loss: 0.9041 - val_accuracy: 0.7192
Epoch 25/30
90/90 [==============================] - 2s 27ms/step - loss: 0.4140 - accuracy: 0.8595 - val_loss: 0.8664 - val_accuracy: 0.7272
Epoch 26/30
90/90 [==============================] - 2s 28ms/step - loss: 0.3872 - accuracy: 0.8687 - val_loss: 0.9073 - val_accuracy: 0.7192
Epoch 27/30
90/90 [==============================] - 2s 27ms/step - loss: 0.3729 - accuracy: 0.8728 - val_loss: 0.9107 - val_accuracy: 0.7258
Epoch 28/30
90/90 [==============================] - 2s 28ms/step - loss: 0.3572 - accuracy: 0.8812 - val_loss: 0.9162 - val_accuracy: 0.7366
Epoch 29/30
90/90 [==============================] - 2s 28ms/step - loss: 0.3240 - accuracy: 0.8902 - val_loss: 0.9197 - val_accuracy: 0.7362
Epoch 30/30
90/90 [==============================] - 2s 27ms/step - loss: 0.2883 - accuracy: 0.9054 - val_loss: 0.9682 - val_accuracy: 0.7298
Total time:  76.63358283042908 seconds

Evaluate the model

In [83]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 1s 4ms/step - loss: 0.9989 - accuracy: 0.7161
test set accuracy:  0.7160999774932861

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [84]:
history_dict = history.history
history_dict.keys()
Out[84]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [85]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [86]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [87]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [88]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [89]:
plot_confusion_matrix(norm_conf_mx)

Visualize predictions

In [90]:
preds = model.predict(test_images_norm)
preds.shape
Out[90]:
(10000, 10)
In [91]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [92]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[92]:
  airplane automobile bird cat deer dog frog horse ship truck
0 0.44% 0.24% 1.06% 81.24% 0.11% 15.00% 0.27% 0.03% 1.60% 0.00%
1 0.16% 57.09% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 42.75% 0.00%
2 0.51% 0.48% 0.00% 0.02% 0.00% 0.00% 0.01% 0.12% 98.63% 0.23%
3 24.98% 0.04% 0.47% 0.06% 73.00% 0.01% 0.00% 0.00% 1.44% 0.00%
4 0.00% 0.00% 4.89% 0.58% 18.70% 0.01% 75.83% 0.00% 0.00% 0.00%
5 0.00% 0.00% 0.05% 1.02% 0.25% 0.33% 98.35% 0.00% 0.00% 0.00%
6 6.10% 53.83% 1.06% 2.45% 0.08% 7.17% 0.08% 0.10% 0.00% 29.13%
7 0.24% 0.00% 55.39% 0.20% 0.13% 0.03% 43.99% 0.00% 0.01% 0.00%
8 0.01% 0.00% 14.70% 71.27% 5.57% 4.29% 3.77% 0.38% 0.00% 0.00%
9 0.05% 98.76% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 1.19%
10 76.25% 0.00% 10.64% 0.02% 13.05% 0.01% 0.00% 0.02% 0.02% 0.00%
11 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
12 0.00% 0.00% 11.10% 0.60% 2.23% 83.12% 0.07% 2.89% 0.00% 0.00%
13 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% 0.00%
14 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
15 0.09% 0.00% 0.00% 0.00% 0.01% 0.00% 0.09% 0.00% 99.80% 0.00%
16 0.00% 0.00% 0.36% 1.00% 0.00% 96.43% 0.03% 2.16% 0.00% 0.01%
17 0.69% 0.00% 1.96% 5.07% 0.45% 34.53% 0.06% 56.67% 0.38% 0.18%
18 0.43% 0.04% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.37% 0.16%
19 0.00% 0.00% 0.00% 0.01% 0.01% 0.00% 99.98% 0.00% 0.00% 0.00%

Plot feature map

In [93]:
(_,_), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

img = test_images[2004]
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)

class_names = ['airplane'
,'automobile'
,'bird'
,'cat'
,'deer'
,'dog'
,'frog' 
,'horse'
,'ship'
,'truck']

plt.imshow(img, cmap='viridis')
plt.axis('off')
plt.show()
In [94]:
# Extracts the outputs of the top 8 layers:
layer_outputs = [layer.output for layer in model.layers[:8]]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
In [95]:
activations = activation_model.predict(img_tensor)
len(activations)
Out[95]:
8
In [96]:
layer_names = []
for layer in model.layers:
    layer_names.append(layer.name)
    
layer_names
Out[96]:
['conv2d_2',
 'max_pooling2d_2',
 'conv2d_3',
 'max_pooling2d_3',
 'conv2d_4',
 'max_pooling2d_4',
 'flatten_3',
 'dense_6',
 'output_layer']
In [97]:
# These are the names of the layers, so can have them as part of our plot
layer_names = []
for layer in model.layers[:6]:
    layer_names.append(layer.name)

images_per_row = 16

# Now let's display our feature maps
for layer_name, layer_activation in zip(layer_names, activations):
    # This is the number of features in the feature map
    n_features = layer_activation.shape[-1]

    # The feature map has shape (1, size, size, n_features)
    size = layer_activation.shape[1]

    # We will tile the activation channels in this matrix
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size))

    # We'll tile each filter into this big horizontal grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0,
                                             :, :,
                                             col * images_per_row + row]
            # Post-process the feature to make it visually palatable
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size,
                         row * size : (row + 1) * size] = channel_image

    # Display the grid
    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1],
                        scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')
    
plt.show();
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:28: RuntimeWarning: invalid value encountered in true_divide

Plot TSNE plot

In [98]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-2]
output_layer_activations = activations[-1]
In [99]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.002s...
[t-SNE] Computed neighbors for 5000 samples in 1.054s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 3.150424
[t-SNE] KL divergence after 250 iterations with early exaggeration: 80.564735
[t-SNE] KL divergence after 300 iterations: 2.680486
In [100]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()

Experiment 5

DNN with 2 layers with regularization (batch normalization, dropout, early stopping)

Create the Model

Build DNN Model

In [101]:
model = models.Sequential()
model.add(layers.Flatten(input_shape=(32, 32, 3)))
model.add(layers.BatchNormalization())

model.add(layers.Dense(units=108, activation=tf.nn.relu))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(units=200, activation=tf.nn.relu))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [102]:
model.summary()
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_4 (Flatten)         (None, 3072)              0         
                                                                 
 batch_normalization (BatchN  (None, 3072)             12288     
 ormalization)                                                   
                                                                 
 dense_7 (Dense)             (None, 108)               331884    
                                                                 
 dropout (Dropout)           (None, 108)               0         
                                                                 
 dense_8 (Dense)             (None, 200)               21800     
                                                                 
 dropout_1 (Dropout)         (None, 200)               0         
                                                                 
 output_layer (Dense)        (None, 10)                2010      
                                                                 
=================================================================
Total params: 367,982
Trainable params: 361,838
Non-trainable params: 6,144
_________________________________________________________________
In [103]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[103]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [104]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

tf.keras.callbacks.ModelCheckpoint
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint
In [105]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=30
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                    ,callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=3)],
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/30
90/90 [==============================] - 1s 10ms/step - loss: 1.9901 - accuracy: 0.3066 - val_loss: 1.9786 - val_accuracy: 0.3528
Epoch 2/30
90/90 [==============================] - 1s 7ms/step - loss: 1.7454 - accuracy: 0.3816 - val_loss: 1.8003 - val_accuracy: 0.4198
Epoch 3/30
90/90 [==============================] - 1s 8ms/step - loss: 1.6634 - accuracy: 0.4111 - val_loss: 1.6665 - val_accuracy: 0.4446
Epoch 4/30
90/90 [==============================] - 1s 8ms/step - loss: 1.6043 - accuracy: 0.4295 - val_loss: 1.5723 - val_accuracy: 0.4642
Epoch 5/30
90/90 [==============================] - 1s 8ms/step - loss: 1.5664 - accuracy: 0.4463 - val_loss: 1.5152 - val_accuracy: 0.4590
Epoch 6/30
90/90 [==============================] - 1s 7ms/step - loss: 1.5296 - accuracy: 0.4566 - val_loss: 1.4934 - val_accuracy: 0.4684
Epoch 7/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4920 - accuracy: 0.4709 - val_loss: 1.4603 - val_accuracy: 0.4808
Epoch 8/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4682 - accuracy: 0.4824 - val_loss: 1.4479 - val_accuracy: 0.4936
Epoch 9/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4483 - accuracy: 0.4871 - val_loss: 1.4364 - val_accuracy: 0.4990
Epoch 10/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4249 - accuracy: 0.4949 - val_loss: 1.4149 - val_accuracy: 0.4914
Epoch 11/30
90/90 [==============================] - 1s 7ms/step - loss: 1.4096 - accuracy: 0.4981 - val_loss: 1.4033 - val_accuracy: 0.5010
Epoch 12/30
90/90 [==============================] - 1s 8ms/step - loss: 1.3864 - accuracy: 0.5076 - val_loss: 1.3913 - val_accuracy: 0.5060
Epoch 13/30
90/90 [==============================] - 1s 8ms/step - loss: 1.3735 - accuracy: 0.5128 - val_loss: 1.3801 - val_accuracy: 0.5032
Epoch 14/30
90/90 [==============================] - 1s 8ms/step - loss: 1.3560 - accuracy: 0.5160 - val_loss: 1.3817 - val_accuracy: 0.5032
Epoch 15/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3401 - accuracy: 0.5208 - val_loss: 1.3804 - val_accuracy: 0.5082
Epoch 16/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3290 - accuracy: 0.5268 - val_loss: 1.3721 - val_accuracy: 0.5170
Epoch 17/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3173 - accuracy: 0.5306 - val_loss: 1.3702 - val_accuracy: 0.5172
Epoch 18/30
90/90 [==============================] - 1s 7ms/step - loss: 1.3043 - accuracy: 0.5358 - val_loss: 1.3534 - val_accuracy: 0.5212
Epoch 19/30
90/90 [==============================] - 1s 8ms/step - loss: 1.2913 - accuracy: 0.5394 - val_loss: 1.3599 - val_accuracy: 0.5114
Epoch 20/30
90/90 [==============================] - 1s 8ms/step - loss: 1.2768 - accuracy: 0.5433 - val_loss: 1.3552 - val_accuracy: 0.5160
Epoch 21/30
90/90 [==============================] - 1s 7ms/step - loss: 1.2703 - accuracy: 0.5448 - val_loss: 1.3536 - val_accuracy: 0.5188
Total time:  15.91767692565918 seconds

Evaluate the model

In order to ensure that this is not a simple "memorization" by the machine, evaluate the performance on the test set.

In [106]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 1s 3ms/step - loss: 1.3253 - accuracy: 0.5302
test set accuracy:  0.5302000045776367

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [107]:
history_dict = history.history
history_dict.keys()
Out[107]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [108]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [109]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [110]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [111]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [112]:
plot_confusion_matrix(norm_conf_mx)

Visualize predictions

In [113]:
preds = model.predict(test_images_norm)
preds.shape
Out[113]:
(10000, 10)
In [114]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [115]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[115]:
  airplane automobile bird cat deer dog frog horse ship truck
0 5.75% 3.66% 9.06% 40.34% 5.99% 24.12% 4.04% 1.43% 5.13% 0.48%
1 3.22% 5.94% 0.01% 0.00% 0.00% 0.00% 0.00% 0.00% 44.17% 46.64%
2 24.36% 8.28% 2.68% 1.42% 2.25% 0.77% 0.06% 1.08% 47.84% 11.25%
3 22.01% 5.34% 7.52% 2.93% 9.38% 1.73% 0.36% 3.30% 44.40% 3.03%
4 0.12% 0.06% 8.47% 5.07% 54.31% 3.46% 28.14% 0.19% 0.16% 0.01%
5 0.93% 0.90% 3.88% 12.33% 2.24% 7.52% 69.19% 1.25% 0.29% 1.47%
6 1.92% 51.34% 2.12% 18.17% 0.05% 15.08% 0.72% 1.61% 0.60% 8.39%
7 0.19% 0.17% 0.87% 0.48% 0.10% 0.08% 97.99% 0.00% 0.02% 0.10%
8 0.62% 0.03% 12.35% 22.58% 13.79% 39.22% 1.77% 9.47% 0.10% 0.08%
9 4.97% 66.74% 1.22% 1.36% 0.36% 0.34% 0.04% 0.27% 7.77% 16.93%
10 46.87% 0.53% 11.43% 4.83% 5.13% 3.61% 1.19% 0.72% 25.11% 0.58%
11 0.39% 31.89% 0.04% 0.04% 0.01% 0.00% 0.00% 0.00% 18.06% 49.56%
12 3.16% 18.35% 8.54% 12.29% 7.31% 13.16% 10.38% 14.53% 5.22% 7.06%
13 15.79% 1.81% 2.80% 3.80% 1.23% 7.79% 0.60% 64.49% 0.46% 1.24%
14 0.30% 56.66% 0.15% 0.56% 0.00% 0.18% 0.02% 0.05% 0.22% 41.86%
15 3.57% 2.08% 0.92% 10.36% 3.38% 11.47% 2.99% 0.93% 63.39% 0.90%
16 4.10% 1.23% 9.14% 16.84% 4.37% 29.01% 1.07% 31.70% 1.05% 1.50%
17 11.31% 1.20% 17.14% 15.30% 18.96% 10.71% 3.64% 11.72% 5.17% 4.84%
18 2.86% 1.29% 0.06% 0.08% 0.12% 0.01% 0.00% 0.05% 94.62% 0.92%
19 0.04% 0.11% 1.30% 6.28% 2.15% 3.66% 84.76% 1.59% 0.01% 0.10%

Plot TSNE plot

In [116]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-4]
output_layer_activations = activations[-1]
In [117]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.001s...
[t-SNE] Computed neighbors for 5000 samples in 0.726s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 5.067357
[t-SNE] KL divergence after 250 iterations with early exaggeration: 80.941147
[t-SNE] KL divergence after 300 iterations: 2.856572
In [118]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()

Experiment 6

DNN with 3 layers with regularization (batch normalization, L2 regulariazation, dropout, early stopping)

Create the Model

Build DNN Model

In [119]:
model = models.Sequential()
model.add(layers.Flatten(input_shape=(32, 32, 3)))
model.add(layers.BatchNormalization())

model.add(layers.Dense(units=108, activation=tf.nn.relu, kernel_regularizer=tf.keras.regularizers.L2(0.001)))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(units=200, activation=tf.nn.relu, kernel_regularizer=tf.keras.regularizers.L2(0.001)))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(units=282, activation=tf.nn.relu, kernel_regularizer=tf.keras.regularizers.L2(0.001)))
model.add(layers.Dropout(0.3))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [120]:
model.summary()
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 flatten_5 (Flatten)         (None, 3072)              0         
                                                                 
 batch_normalization_1 (Batc  (None, 3072)             12288     
 hNormalization)                                                 
                                                                 
 dense_9 (Dense)             (None, 108)               331884    
                                                                 
 dropout_2 (Dropout)         (None, 108)               0         
                                                                 
 dense_10 (Dense)            (None, 200)               21800     
                                                                 
 dropout_3 (Dropout)         (None, 200)               0         
                                                                 
 dense_11 (Dense)            (None, 282)               56682     
                                                                 
 dropout_4 (Dropout)         (None, 282)               0         
                                                                 
 output_layer (Dense)        (None, 10)                2830      
                                                                 
=================================================================
Total params: 425,484
Trainable params: 419,340
Non-trainable params: 6,144
_________________________________________________________________
In [121]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[121]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [122]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

In [123]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=30
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                    ,callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=3)],
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/30
90/90 [==============================] - 2s 19ms/step - loss: 2.5250 - accuracy: 0.2909 - val_loss: 2.5056 - val_accuracy: 0.3684
Epoch 2/30
90/90 [==============================] - 1s 12ms/step - loss: 2.2334 - accuracy: 0.3690 - val_loss: 2.2489 - val_accuracy: 0.4150
Epoch 3/30
90/90 [==============================] - 1s 12ms/step - loss: 2.0843 - accuracy: 0.4002 - val_loss: 2.0301 - val_accuracy: 0.4452
Epoch 4/30
90/90 [==============================] - 1s 8ms/step - loss: 1.9720 - accuracy: 0.4199 - val_loss: 1.9016 - val_accuracy: 0.4654
Epoch 5/30
90/90 [==============================] - 1s 8ms/step - loss: 1.8804 - accuracy: 0.4376 - val_loss: 1.8068 - val_accuracy: 0.4670
Epoch 6/30
90/90 [==============================] - 1s 8ms/step - loss: 1.8153 - accuracy: 0.4446 - val_loss: 1.7397 - val_accuracy: 0.4656
Epoch 7/30
90/90 [==============================] - 1s 8ms/step - loss: 1.7574 - accuracy: 0.4555 - val_loss: 1.6971 - val_accuracy: 0.4840
Epoch 8/30
90/90 [==============================] - 1s 8ms/step - loss: 1.7158 - accuracy: 0.4652 - val_loss: 1.6490 - val_accuracy: 0.4772
Epoch 9/30
90/90 [==============================] - 1s 8ms/step - loss: 1.6824 - accuracy: 0.4717 - val_loss: 1.6331 - val_accuracy: 0.4840
Epoch 10/30
90/90 [==============================] - 1s 8ms/step - loss: 1.6478 - accuracy: 0.4806 - val_loss: 1.6042 - val_accuracy: 0.4930
Epoch 11/30
90/90 [==============================] - 1s 8ms/step - loss: 1.6281 - accuracy: 0.4830 - val_loss: 1.5985 - val_accuracy: 0.4942
Epoch 12/30
90/90 [==============================] - 1s 8ms/step - loss: 1.6074 - accuracy: 0.4868 - val_loss: 1.5822 - val_accuracy: 0.4958
Epoch 13/30
90/90 [==============================] - 1s 8ms/step - loss: 1.5895 - accuracy: 0.4949 - val_loss: 1.5542 - val_accuracy: 0.5008
Epoch 14/30
90/90 [==============================] - 1s 8ms/step - loss: 1.5658 - accuracy: 0.5011 - val_loss: 1.5597 - val_accuracy: 0.4986
Epoch 15/30
90/90 [==============================] - 1s 8ms/step - loss: 1.5611 - accuracy: 0.5017 - val_loss: 1.5488 - val_accuracy: 0.4930
Epoch 16/30
90/90 [==============================] - 1s 8ms/step - loss: 1.5413 - accuracy: 0.5067 - val_loss: 1.5447 - val_accuracy: 0.4922
Total time:  14.772281408309937 seconds

Evaluate the model

In [124]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 1s 3ms/step - loss: 1.5059 - accuracy: 0.5206
test set accuracy:  0.5206000208854675

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [125]:
history_dict = history.history
history_dict.keys()
Out[125]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [126]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [127]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [128]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [129]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [130]:
plot_confusion_matrix(norm_conf_mx)

Visualize predictions

In [131]:
preds = model.predict(test_images_norm)
preds.shape
Out[131]:
(10000, 10)
In [132]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [133]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[133]:
  airplane automobile bird cat deer dog frog horse ship truck
0 5.39% 5.07% 7.16% 37.58% 6.73% 22.54% 6.62% 3.66% 3.86% 1.39%
1 2.84% 27.22% 0.06% 0.08% 0.03% 0.01% 0.02% 0.02% 17.51% 52.21%
2 16.56% 21.33% 0.57% 0.39% 0.83% 0.16% 0.12% 0.33% 49.63% 10.08%
3 22.82% 22.14% 2.42% 1.36% 2.49% 0.91% 0.39% 2.69% 36.78% 7.99%
4 0.21% 0.07% 9.07% 1.65% 75.28% 1.97% 10.77% 0.80% 0.14% 0.04%
5 1.47% 1.18% 4.69% 17.45% 4.46% 9.75% 57.79% 2.21% 0.30% 0.70%
6 3.70% 38.37% 2.67% 23.76% 0.18% 18.22% 3.24% 4.31% 1.40% 4.14%
7 1.39% 1.25% 5.80% 5.46% 4.07% 2.37% 77.58% 0.46% 0.35% 1.27%
8 2.29% 0.44% 19.97% 32.15% 9.87% 25.39% 4.77% 3.73% 0.99% 0.41%
9 1.11% 78.61% 0.38% 0.58% 0.13% 0.17% 0.12% 0.10% 5.10% 13.68%
10 38.79% 0.91% 19.62% 4.99% 5.41% 2.70% 1.96% 0.78% 24.50% 0.33%
11 0.34% 25.11% 0.06% 0.16% 0.02% 0.03% 0.02% 0.03% 2.03% 72.20%
12 0.94% 2.07% 11.16% 11.44% 11.06% 15.10% 38.47% 7.75% 0.48% 1.51%
13 5.15% 2.01% 1.10% 1.35% 0.65% 4.07% 0.55% 83.22% 0.25% 1.65%
14 0.80% 54.72% 0.89% 1.92% 0.05% 0.66% 0.34% 0.35% 1.21% 39.06%
15 5.11% 6.01% 2.27% 7.14% 3.00% 5.53% 4.57% 1.42% 61.11% 3.83%
16 10.13% 2.50% 7.31% 18.24% 3.67% 27.04% 2.89% 24.27% 0.98% 2.98%
17 3.79% 0.41% 15.84% 19.30% 19.66% 14.54% 2.04% 18.64% 1.39% 4.40%
18 2.32% 3.78% 0.02% 0.03% 0.08% 0.00% 0.01% 0.01% 91.30% 2.46%
19 0.95% 0.59% 3.71% 6.96% 6.03% 12.49% 29.39% 38.17% 0.18% 1.55%

Plot TSNE plot

In [134]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-4]
output_layer_activations = activations[-1]
In [135]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.002s...
[t-SNE] Computed neighbors for 5000 samples in 0.799s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 1.397671
[t-SNE] KL divergence after 250 iterations with early exaggeration: 80.499779
[t-SNE] KL divergence after 300 iterations: 2.400108
In [136]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()

Experiment 7

CNN with 2 convolution/max pooling layers with regularization (L2 regularization, dropout, early stopping)

Create the Model

Build CNN Model

In [20]:
model = models.Sequential()

model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu, input_shape=(32, 32, 3)))
model.add(layers.MaxPool2D((2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Conv2D(filters=108, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D(pool_size=(2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Flatten())
model.add(layers.Dense(units=210, activation=tf.nn.relu, kernel_regularizer=tf.keras.regularizers.L2(0.001)))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [21]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_4 (Conv2D)           (None, 30, 30, 64)        1792      
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 15, 15, 64)       0         
 2D)                                                             
                                                                 
 dropout_2 (Dropout)         (None, 15, 15, 64)        0         
                                                                 
 conv2d_5 (Conv2D)           (None, 13, 13, 108)       62316     
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 6, 6, 108)        0         
 2D)                                                             
                                                                 
 dropout_3 (Dropout)         (None, 6, 6, 108)         0         
                                                                 
 flatten_1 (Flatten)         (None, 3888)              0         
                                                                 
 dense_1 (Dense)             (None, 210)               816690    
                                                                 
 output_layer (Dense)        (None, 10)                2110      
                                                                 
=================================================================
Total params: 882,908
Trainable params: 882,908
Non-trainable params: 0
_________________________________________________________________
In [22]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[22]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [23]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

In [24]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=30
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                    ,callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=3)],
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/30
90/90 [==============================] - 2s 17ms/step - loss: 2.0325 - accuracy: 0.3316 - val_loss: 1.6537 - val_accuracy: 0.4438
Epoch 2/30
90/90 [==============================] - 1s 14ms/step - loss: 1.5394 - accuracy: 0.4848 - val_loss: 1.4328 - val_accuracy: 0.5236
Epoch 3/30
90/90 [==============================] - 1s 14ms/step - loss: 1.3914 - accuracy: 0.5408 - val_loss: 1.3557 - val_accuracy: 0.5528
Epoch 4/30
90/90 [==============================] - 1s 14ms/step - loss: 1.2978 - accuracy: 0.5783 - val_loss: 1.2694 - val_accuracy: 0.5896
Epoch 5/30
90/90 [==============================] - 1s 14ms/step - loss: 1.2395 - accuracy: 0.6026 - val_loss: 1.1851 - val_accuracy: 0.6228
Epoch 6/30
90/90 [==============================] - 1s 14ms/step - loss: 1.1817 - accuracy: 0.6219 - val_loss: 1.1573 - val_accuracy: 0.6266
Epoch 7/30
90/90 [==============================] - 1s 14ms/step - loss: 1.1452 - accuracy: 0.6377 - val_loss: 1.1302 - val_accuracy: 0.6410
Epoch 8/30
90/90 [==============================] - 1s 14ms/step - loss: 1.1061 - accuracy: 0.6498 - val_loss: 1.1008 - val_accuracy: 0.6546
Epoch 9/30
90/90 [==============================] - 1s 14ms/step - loss: 1.0822 - accuracy: 0.6630 - val_loss: 1.0406 - val_accuracy: 0.6776
Epoch 10/30
90/90 [==============================] - 1s 14ms/step - loss: 1.0622 - accuracy: 0.6701 - val_loss: 1.0445 - val_accuracy: 0.6770
Epoch 11/30
90/90 [==============================] - 1s 14ms/step - loss: 1.0382 - accuracy: 0.6774 - val_loss: 1.0434 - val_accuracy: 0.6704
Epoch 12/30
90/90 [==============================] - 1s 15ms/step - loss: 1.0207 - accuracy: 0.6856 - val_loss: 1.0293 - val_accuracy: 0.6890
Epoch 13/30
90/90 [==============================] - 1s 15ms/step - loss: 0.9937 - accuracy: 0.6968 - val_loss: 1.0090 - val_accuracy: 0.6952
Epoch 14/30
90/90 [==============================] - 1s 14ms/step - loss: 0.9842 - accuracy: 0.6998 - val_loss: 0.9955 - val_accuracy: 0.6964
Epoch 15/30
90/90 [==============================] - 1s 14ms/step - loss: 0.9588 - accuracy: 0.7090 - val_loss: 0.9940 - val_accuracy: 0.7010
Epoch 16/30
90/90 [==============================] - 1s 14ms/step - loss: 0.9399 - accuracy: 0.7175 - val_loss: 0.9539 - val_accuracy: 0.7116
Epoch 17/30
90/90 [==============================] - 1s 15ms/step - loss: 0.9227 - accuracy: 0.7266 - val_loss: 0.9520 - val_accuracy: 0.7124
Epoch 18/30
90/90 [==============================] - 1s 14ms/step - loss: 0.9117 - accuracy: 0.7300 - val_loss: 0.9476 - val_accuracy: 0.7184
Epoch 19/30
90/90 [==============================] - 1s 15ms/step - loss: 0.9027 - accuracy: 0.7362 - val_loss: 0.9527 - val_accuracy: 0.7164
Epoch 20/30
90/90 [==============================] - 1s 14ms/step - loss: 0.8832 - accuracy: 0.7422 - val_loss: 0.9367 - val_accuracy: 0.7260
Epoch 21/30
90/90 [==============================] - 1s 14ms/step - loss: 0.8637 - accuracy: 0.7509 - val_loss: 0.9253 - val_accuracy: 0.7268
Epoch 22/30
90/90 [==============================] - 1s 14ms/step - loss: 0.8528 - accuracy: 0.7560 - val_loss: 0.9290 - val_accuracy: 0.7256
Epoch 23/30
90/90 [==============================] - 1s 14ms/step - loss: 0.8438 - accuracy: 0.7589 - val_loss: 0.9041 - val_accuracy: 0.7356
Epoch 24/30
90/90 [==============================] - 1s 15ms/step - loss: 0.8288 - accuracy: 0.7656 - val_loss: 0.9103 - val_accuracy: 0.7384
Epoch 25/30
90/90 [==============================] - 1s 14ms/step - loss: 0.8115 - accuracy: 0.7726 - val_loss: 0.9316 - val_accuracy: 0.7276
Epoch 26/30
90/90 [==============================] - 1s 14ms/step - loss: 0.8015 - accuracy: 0.7763 - val_loss: 0.9160 - val_accuracy: 0.7338
Epoch 27/30
90/90 [==============================] - 1s 14ms/step - loss: 0.7966 - accuracy: 0.7794 - val_loss: 0.9111 - val_accuracy: 0.7374
Total time:  36.713194847106934 seconds

Evaluate the model

In [25]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 1s 3ms/step - loss: 0.9364 - accuracy: 0.7325
test set accuracy:  0.7325000166893005

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [26]:
history_dict = history.history
history_dict.keys()
Out[26]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [27]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [28]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [29]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [30]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [31]:
plot_confusion_matrix(norm_conf_mx)

Visualize predictions

In [32]:
preds = model.predict(test_images_norm)
preds.shape
Out[32]:
(10000, 10)
In [33]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [34]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[34]:
  airplane automobile bird cat deer dog frog horse ship truck
0 1.31% 0.57% 6.69% 59.21% 0.73% 16.69% 8.49% 0.18% 6.09% 0.02%
1 2.78% 37.79% 0.04% 0.00% 0.00% 0.00% 0.00% 0.00% 59.03% 0.36%
2 3.76% 36.28% 0.26% 0.59% 0.30% 0.10% 0.06% 0.15% 56.87% 1.62%
3 71.55% 3.69% 5.82% 0.18% 1.76% 0.00% 0.05% 0.03% 16.76% 0.14%
4 0.00% 0.00% 1.83% 6.67% 72.75% 0.21% 18.52% 0.00% 0.01% 0.00%
5 0.01% 0.05% 0.76% 2.07% 2.83% 1.07% 92.68% 0.49% 0.01% 0.03%
6 0.31% 96.91% 0.02% 0.51% 0.00% 0.93% 0.03% 0.13% 0.00% 1.17%
7 0.62% 0.05% 36.23% 10.21% 8.69% 1.18% 42.74% 0.14% 0.07% 0.06%
8 0.12% 0.00% 6.24% 75.60% 6.31% 8.68% 1.43% 1.62% 0.00% 0.00%
9 0.15% 92.38% 0.10% 0.02% 0.01% 0.01% 0.02% 0.00% 0.13% 7.18%
10 39.36% 0.07% 11.90% 6.08% 26.63% 10.39% 1.41% 0.78% 3.29% 0.09%
11 0.01% 0.41% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.06% 99.52%
12 0.04% 0.15% 3.31% 10.23% 1.31% 80.08% 1.62% 2.80% 0.44% 0.04%
13 0.00% 0.00% 0.00% 0.00% 0.01% 0.10% 0.00% 99.89% 0.00% 0.00%
14 0.06% 9.11% 0.01% 0.01% 0.00% 0.00% 0.01% 0.00% 0.71% 90.08%
15 0.64% 0.28% 0.20% 1.93% 0.77% 0.11% 9.03% 0.00% 87.02% 0.02%
16 0.00% 0.01% 0.19% 3.98% 0.02% 95.42% 0.03% 0.33% 0.01% 0.01%
17 1.47% 0.09% 4.53% 10.49% 12.13% 6.20% 1.07% 62.89% 0.54% 0.58%
18 1.58% 4.41% 0.00% 0.03% 0.02% 0.00% 0.03% 0.00% 89.94% 3.97%
19 0.00% 0.07% 0.03% 0.17% 0.42% 0.16% 98.92% 0.23% 0.00% 0.00%

Plot feature map

In [35]:
(_,_), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

img = test_images[2004]
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)

class_names = ['airplane'
,'automobile'
,'bird'
,'cat'
,'deer'
,'dog'
,'frog' 
,'horse'
,'ship'
,'truck']

plt.imshow(img, cmap='viridis')
plt.axis('off')
plt.show()
In [36]:
# Extracts the outputs of the top 8 layers:
layer_outputs = [layer.output for layer in model.layers[:8]]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
In [37]:
activations = activation_model.predict(img_tensor)
len(activations)
Out[37]:
8
In [38]:
layer_names = []
for layer in model.layers:
    layer_names.append(layer.name)
    
layer_names
Out[38]:
['conv2d_4',
 'max_pooling2d_2',
 'dropout_2',
 'conv2d_5',
 'max_pooling2d_3',
 'dropout_3',
 'flatten_1',
 'dense_1',
 'output_layer']
In [39]:
# These are the names of the layers, so can have them as part of our plot
layer_names = []
for layer in model.layers[:6]:
    layer_names.append(layer.name)

images_per_row = 16

# Now let's display our feature maps
for layer_name, layer_activation in zip(layer_names, activations):
    # This is the number of features in the feature map
    n_features = layer_activation.shape[-1]

    # The feature map has shape (1, size, size, n_features)
    size = layer_activation.shape[1]

    # We will tile the activation channels in this matrix
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size))

    # We'll tile each filter into this big horizontal grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0,
                                             :, :,
                                             col * images_per_row + row]
            # Post-process the feature to make it visually palatable
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size,
                         row * size : (row + 1) * size] = channel_image

    # Display the grid
    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1],
                        scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')
    
plt.show();
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:28: RuntimeWarning: invalid value encountered in true_divide

Plot TSNE plot

In [40]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-2]
output_layer_activations = activations[-1]
In [41]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.001s...
[t-SNE] Computed neighbors for 5000 samples in 0.664s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 1.907308
[t-SNE] KL divergence after 250 iterations with early exaggeration: 80.949333
[t-SNE] KL divergence after 300 iterations: 2.789860
In [42]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()

Experiment 8

CNN with 3 convolution/max pooling layers with regularization (L2 regularization, dropout, early stopping)

Create the Model

Build CNN Model

In [35]:
model = models.Sequential()

model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu,input_shape=(32, 32, 3))) 
model.add(layers.MaxPool2D((2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Conv2D(filters=108, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D(pool_size=(2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Conv2D(filters=180, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D(pool_size=(2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Flatten())
model.add(layers.Dense(units=210, activation=tf.nn.relu, kernel_regularizer=tf.keras.regularizers.L2(0.001)))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [36]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_2 (Conv2D)           (None, 30, 30, 64)        1792      
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 15, 15, 64)       0         
 2D)                                                             
                                                                 
 dropout_2 (Dropout)         (None, 15, 15, 64)        0         
                                                                 
 conv2d_3 (Conv2D)           (None, 13, 13, 108)       62316     
                                                                 
 max_pooling2d_3 (MaxPooling  (None, 6, 6, 108)        0         
 2D)                                                             
                                                                 
 dropout_3 (Dropout)         (None, 6, 6, 108)         0         
                                                                 
 conv2d_4 (Conv2D)           (None, 4, 4, 180)         175140    
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 2, 2, 180)        0         
 2D)                                                             
                                                                 
 dropout_4 (Dropout)         (None, 2, 2, 180)         0         
                                                                 
 flatten_1 (Flatten)         (None, 720)               0         
                                                                 
 dense_1 (Dense)             (None, 210)               151410    
                                                                 
 output_layer (Dense)        (None, 10)                2110      
                                                                 
=================================================================
Total params: 392,768
Trainable params: 392,768
Non-trainable params: 0
_________________________________________________________________
In [37]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[37]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [38]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

In [39]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=30
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                   ,callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=3)],
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/30
90/90 [==============================] - 4s 31ms/step - loss: 2.1352 - accuracy: 0.2789 - val_loss: 1.8295 - val_accuracy: 0.3756
Epoch 2/30
90/90 [==============================] - 3s 29ms/step - loss: 1.6731 - accuracy: 0.4256 - val_loss: 1.5189 - val_accuracy: 0.4984
Epoch 3/30
90/90 [==============================] - 3s 28ms/step - loss: 1.4918 - accuracy: 0.4921 - val_loss: 1.3718 - val_accuracy: 0.5484
Epoch 4/30
90/90 [==============================] - 3s 28ms/step - loss: 1.3806 - accuracy: 0.5299 - val_loss: 1.2832 - val_accuracy: 0.5688
Epoch 5/30
90/90 [==============================] - 3s 29ms/step - loss: 1.3029 - accuracy: 0.5603 - val_loss: 1.2086 - val_accuracy: 0.6110
Epoch 6/30
90/90 [==============================] - 3s 28ms/step - loss: 1.2357 - accuracy: 0.5874 - val_loss: 1.1215 - val_accuracy: 0.6244
Epoch 7/30
90/90 [==============================] - 3s 29ms/step - loss: 1.1833 - accuracy: 0.6014 - val_loss: 1.0788 - val_accuracy: 0.6394
Epoch 8/30
90/90 [==============================] - 3s 29ms/step - loss: 1.1432 - accuracy: 0.6159 - val_loss: 1.0422 - val_accuracy: 0.6556
Epoch 9/30
90/90 [==============================] - 3s 28ms/step - loss: 1.1073 - accuracy: 0.6287 - val_loss: 1.0220 - val_accuracy: 0.6674
Epoch 10/30
90/90 [==============================] - 3s 29ms/step - loss: 1.0718 - accuracy: 0.6450 - val_loss: 0.9581 - val_accuracy: 0.6776
Epoch 11/30
90/90 [==============================] - 3s 28ms/step - loss: 1.0367 - accuracy: 0.6558 - val_loss: 0.9487 - val_accuracy: 0.6894
Epoch 12/30
90/90 [==============================] - 3s 32ms/step - loss: 1.0098 - accuracy: 0.6647 - val_loss: 0.9145 - val_accuracy: 0.7068
Epoch 13/30
90/90 [==============================] - 3s 33ms/step - loss: 0.9887 - accuracy: 0.6733 - val_loss: 0.8932 - val_accuracy: 0.7158
Epoch 14/30
90/90 [==============================] - 3s 28ms/step - loss: 0.9582 - accuracy: 0.6838 - val_loss: 0.8826 - val_accuracy: 0.7056
Epoch 15/30
90/90 [==============================] - 3s 29ms/step - loss: 0.9413 - accuracy: 0.6886 - val_loss: 0.8591 - val_accuracy: 0.7178
Epoch 16/30
90/90 [==============================] - 3s 29ms/step - loss: 0.9184 - accuracy: 0.6957 - val_loss: 0.8413 - val_accuracy: 0.7262
Epoch 17/30
90/90 [==============================] - 3s 28ms/step - loss: 0.9070 - accuracy: 0.7015 - val_loss: 0.8413 - val_accuracy: 0.7296
Epoch 18/30
90/90 [==============================] - 3s 28ms/step - loss: 0.8862 - accuracy: 0.7100 - val_loss: 0.8165 - val_accuracy: 0.7354
Epoch 19/30
90/90 [==============================] - 3s 29ms/step - loss: 0.8772 - accuracy: 0.7126 - val_loss: 0.8095 - val_accuracy: 0.7408
Epoch 20/30
90/90 [==============================] - 3s 29ms/step - loss: 0.8622 - accuracy: 0.7162 - val_loss: 0.8031 - val_accuracy: 0.7424
Epoch 21/30
90/90 [==============================] - 3s 29ms/step - loss: 0.8421 - accuracy: 0.7223 - val_loss: 0.7903 - val_accuracy: 0.7426
Epoch 22/30
90/90 [==============================] - 3s 29ms/step - loss: 0.8308 - accuracy: 0.7288 - val_loss: 0.7774 - val_accuracy: 0.7522
Epoch 23/30
90/90 [==============================] - 3s 28ms/step - loss: 0.8217 - accuracy: 0.7295 - val_loss: 0.7724 - val_accuracy: 0.7504
Epoch 24/30
90/90 [==============================] - 3s 29ms/step - loss: 0.8138 - accuracy: 0.7312 - val_loss: 0.7723 - val_accuracy: 0.7516
Epoch 25/30
90/90 [==============================] - 3s 29ms/step - loss: 0.7942 - accuracy: 0.7419 - val_loss: 0.7608 - val_accuracy: 0.7588
Epoch 26/30
90/90 [==============================] - 3s 29ms/step - loss: 0.7889 - accuracy: 0.7420 - val_loss: 0.7455 - val_accuracy: 0.7566
Epoch 27/30
90/90 [==============================] - 3s 29ms/step - loss: 0.7826 - accuracy: 0.7450 - val_loss: 0.7386 - val_accuracy: 0.7662
Epoch 28/30
90/90 [==============================] - 3s 28ms/step - loss: 0.7666 - accuracy: 0.7517 - val_loss: 0.7323 - val_accuracy: 0.7640
Epoch 29/30
90/90 [==============================] - 3s 33ms/step - loss: 0.7577 - accuracy: 0.7547 - val_loss: 0.7401 - val_accuracy: 0.7630
Epoch 30/30
90/90 [==============================] - 3s 29ms/step - loss: 0.7621 - accuracy: 0.7536 - val_loss: 0.7180 - val_accuracy: 0.7730
Total time:  80.60542130470276 seconds

Evaluate the model

In [40]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 1s 4ms/step - loss: 0.7437 - accuracy: 0.7626
test set accuracy:  0.7626000046730042

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [41]:
history_dict = history.history
history_dict.keys()
Out[41]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [42]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [43]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [44]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [45]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [46]:
plot_confusion_matrix(norm_conf_mx)

Visualize predictions

In [47]:
preds = model.predict(test_images_norm)
preds.shape
Out[47]:
(10000, 10)
In [48]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [49]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[49]:
  airplane automobile bird cat deer dog frog horse ship truck
0 0.06% 0.19% 0.13% 86.88% 0.07% 8.21% 3.63% 0.05% 0.70% 0.08%
1 2.60% 3.62% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 93.63% 0.15%
2 7.22% 11.00% 0.17% 0.22% 0.09% 0.04% 0.10% 0.07% 78.67% 2.42%
3 85.17% 3.06% 2.12% 0.44% 1.40% 0.01% 0.09% 0.02% 7.48% 0.21%
4 0.00% 0.01% 1.48% 2.59% 41.80% 0.07% 54.05% 0.00% 0.00% 0.00%
5 0.00% 0.01% 0.22% 3.08% 0.22% 1.61% 94.75% 0.11% 0.00% 0.01%
6 1.18% 72.99% 0.40% 3.60% 0.00% 2.05% 0.64% 0.48% 0.12% 18.53%
7 2.55% 0.06% 27.43% 6.94% 8.99% 1.59% 51.82% 0.23% 0.05% 0.33%
8 0.04% 0.01% 1.80% 74.12% 6.07% 13.31% 1.60% 3.01% 0.02% 0.02%
9 2.67% 77.30% 0.11% 0.06% 0.03% 0.01% 0.11% 0.03% 0.75% 18.93%
10 23.30% 0.14% 7.10% 10.17% 15.93% 11.78% 0.28% 20.95% 8.87% 1.48%
11 0.01% 0.18% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 99.80%
12 0.02% 0.07% 4.58% 15.81% 6.42% 60.98% 9.58% 2.44% 0.09% 0.01%
13 0.00% 0.00% 0.00% 0.00% 0.03% 0.12% 0.00% 99.85% 0.00% 0.00%
14 0.15% 1.29% 0.01% 0.02% 0.00% 0.00% 0.00% 0.09% 0.11% 98.32%
15 2.93% 2.15% 17.59% 6.59% 3.45% 0.24% 33.51% 0.01% 33.48% 0.06%
16 0.00% 0.02% 0.90% 20.34% 0.03% 76.34% 0.18% 2.17% 0.01% 0.02%
17 0.32% 0.07% 1.85% 16.70% 3.05% 13.17% 1.78% 61.88% 0.13% 1.06%
18 4.07% 3.00% 0.00% 0.05% 0.02% 0.01% 0.02% 0.02% 77.38% 15.41%
19 0.00% 0.01% 0.03% 0.03% 0.02% 0.01% 99.91% 0.00% 0.00% 0.00%

Plot feature map

In [50]:
(_,_), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

img = test_images[2004]
img_tensor = image.img_to_array(img)
img_tensor = np.expand_dims(img_tensor, axis=0)

class_names = ['airplane'
,'automobile'
,'bird'
,'cat'
,'deer'
,'dog'
,'frog' 
,'horse'
,'ship'
,'truck']

plt.imshow(img, cmap='viridis')
plt.axis('off')
plt.show()
In [51]:
# Extracts the outputs of the top 8 layers:
layer_outputs = [layer.output for layer in model.layers[:8]]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
In [52]:
activations = activation_model.predict(img_tensor)
len(activations)
Out[52]:
8
In [53]:
layer_names = []
for layer in model.layers:
    layer_names.append(layer.name)
    
layer_names
Out[53]:
['conv2d_2',
 'max_pooling2d_2',
 'dropout_2',
 'conv2d_3',
 'max_pooling2d_3',
 'dropout_3',
 'conv2d_4',
 'max_pooling2d_4',
 'dropout_4',
 'flatten_1',
 'dense_1',
 'output_layer']
In [54]:
# These are the names of the layers, so can have them as part of our plot
layer_names = []
for layer in model.layers[:8]:
    layer_names.append(layer.name)

images_per_row = 16

# Now let's display our feature maps
for layer_name, layer_activation in zip(layer_names, activations):
    # This is the number of features in the feature map
    n_features = layer_activation.shape[-1]

    # The feature map has shape (1, size, size, n_features)
    size = layer_activation.shape[1]

    # We will tile the activation channels in this matrix
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size))

    # We'll tile each filter into this big horizontal grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0,
                                             :, :,
                                             col * images_per_row + row]
            # Post-process the feature to make it visually palatable
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size,
                         row * size : (row + 1) * size] = channel_image

    # Display the grid
    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1],
                        scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')
    
plt.show();
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:28: RuntimeWarning: invalid value encountered in true_divide

Plot TSNE plot

In [55]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-2]
output_layer_activations = activations[-1]
In [56]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.002s...
[t-SNE] Computed neighbors for 5000 samples in 0.853s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 0.917538
[t-SNE] KL divergence after 250 iterations with early exaggeration: 78.486435
[t-SNE] KL divergence after 300 iterations: 2.318424
In [57]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()

Experiment 9

CNN with 3 convolution/max pooling layers with regularization (L2 regularization, dropout, early stopping) and increased complexity using a larger number of filters, nodes, and epochs than Experiment 8.

Create the Model

Build CNN Model

In [10]:
model = models.Sequential()

model.add(layers.Conv2D(filters=128, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu,input_shape=(32, 32, 3))) 
model.add(layers.MaxPool2D((2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Conv2D(filters=216, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D(pool_size=(2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Conv2D(filters=360, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D(pool_size=(2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Flatten())
model.add(layers.Dense(units=420, activation=tf.nn.relu, kernel_regularizer=tf.keras.regularizers.L2(0.001)))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [11]:
model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 30, 30, 128)       3584      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 15, 15, 128)      0         
 )                                                               
                                                                 
 dropout (Dropout)           (None, 15, 15, 128)       0         
                                                                 
 conv2d_1 (Conv2D)           (None, 13, 13, 216)       249048    
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 6, 6, 216)        0         
 2D)                                                             
                                                                 
 dropout_1 (Dropout)         (None, 6, 6, 216)         0         
                                                                 
 conv2d_2 (Conv2D)           (None, 4, 4, 360)         700200    
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 2, 2, 360)        0         
 2D)                                                             
                                                                 
 dropout_2 (Dropout)         (None, 2, 2, 360)         0         
                                                                 
 flatten (Flatten)           (None, 1440)              0         
                                                                 
 dense (Dense)               (None, 420)               605220    
                                                                 
 output_layer (Dense)        (None, 10)                4210      
                                                                 
=================================================================
Total params: 1,562,262
Trainable params: 1,562,262
Non-trainable params: 0
_________________________________________________________________
In [12]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[12]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [13]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

In [14]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=50
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                   ,callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=3)],
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/50
90/90 [==============================] - 13s 39ms/step - loss: 2.1816 - accuracy: 0.2827 - val_loss: 1.7279 - val_accuracy: 0.4012
Epoch 2/50
90/90 [==============================] - 3s 34ms/step - loss: 1.6029 - accuracy: 0.4517 - val_loss: 1.4508 - val_accuracy: 0.5038
Epoch 3/50
90/90 [==============================] - 3s 33ms/step - loss: 1.3948 - accuracy: 0.5255 - val_loss: 1.3037 - val_accuracy: 0.5580
Epoch 4/50
90/90 [==============================] - 3s 33ms/step - loss: 1.2780 - accuracy: 0.5715 - val_loss: 1.1596 - val_accuracy: 0.6124
Epoch 5/50
90/90 [==============================] - 3s 33ms/step - loss: 1.1722 - accuracy: 0.6086 - val_loss: 1.0852 - val_accuracy: 0.6302
Epoch 6/50
90/90 [==============================] - 3s 33ms/step - loss: 1.1085 - accuracy: 0.6332 - val_loss: 1.0173 - val_accuracy: 0.6636
Epoch 7/50
90/90 [==============================] - 3s 33ms/step - loss: 1.0560 - accuracy: 0.6497 - val_loss: 1.0061 - val_accuracy: 0.6734
Epoch 8/50
90/90 [==============================] - 3s 33ms/step - loss: 0.9982 - accuracy: 0.6737 - val_loss: 0.9488 - val_accuracy: 0.6948
Epoch 9/50
90/90 [==============================] - 3s 33ms/step - loss: 0.9489 - accuracy: 0.6880 - val_loss: 0.8989 - val_accuracy: 0.7102
Epoch 10/50
90/90 [==============================] - 3s 33ms/step - loss: 0.9131 - accuracy: 0.7021 - val_loss: 0.8669 - val_accuracy: 0.7172
Epoch 11/50
90/90 [==============================] - 3s 33ms/step - loss: 0.8813 - accuracy: 0.7145 - val_loss: 0.9001 - val_accuracy: 0.7172
Epoch 12/50
90/90 [==============================] - 3s 33ms/step - loss: 0.8576 - accuracy: 0.7232 - val_loss: 0.8268 - val_accuracy: 0.7362
Epoch 13/50
90/90 [==============================] - 3s 33ms/step - loss: 0.8269 - accuracy: 0.7336 - val_loss: 0.7991 - val_accuracy: 0.7474
Epoch 14/50
90/90 [==============================] - 3s 33ms/step - loss: 0.8092 - accuracy: 0.7417 - val_loss: 0.8087 - val_accuracy: 0.7428
Epoch 15/50
90/90 [==============================] - 3s 33ms/step - loss: 0.7847 - accuracy: 0.7499 - val_loss: 0.7720 - val_accuracy: 0.7600
Epoch 16/50
90/90 [==============================] - 3s 34ms/step - loss: 0.7721 - accuracy: 0.7533 - val_loss: 0.7557 - val_accuracy: 0.7628
Epoch 17/50
90/90 [==============================] - 3s 35ms/step - loss: 0.7482 - accuracy: 0.7618 - val_loss: 0.7739 - val_accuracy: 0.7516
Epoch 18/50
90/90 [==============================] - 3s 33ms/step - loss: 0.7255 - accuracy: 0.7701 - val_loss: 0.7542 - val_accuracy: 0.7582
Epoch 19/50
90/90 [==============================] - 3s 33ms/step - loss: 0.7178 - accuracy: 0.7717 - val_loss: 0.7272 - val_accuracy: 0.7720
Epoch 20/50
90/90 [==============================] - 3s 33ms/step - loss: 0.6978 - accuracy: 0.7800 - val_loss: 0.7146 - val_accuracy: 0.7734
Epoch 21/50
90/90 [==============================] - 3s 33ms/step - loss: 0.6870 - accuracy: 0.7831 - val_loss: 0.7084 - val_accuracy: 0.7780
Epoch 22/50
90/90 [==============================] - 3s 33ms/step - loss: 0.6871 - accuracy: 0.7834 - val_loss: 0.7133 - val_accuracy: 0.7750
Epoch 23/50
90/90 [==============================] - 3s 33ms/step - loss: 0.6611 - accuracy: 0.7931 - val_loss: 0.7335 - val_accuracy: 0.7656
Epoch 24/50
90/90 [==============================] - 3s 33ms/step - loss: 0.6509 - accuracy: 0.7959 - val_loss: 0.7037 - val_accuracy: 0.7810
Epoch 25/50
90/90 [==============================] - 3s 33ms/step - loss: 0.6468 - accuracy: 0.7982 - val_loss: 0.7157 - val_accuracy: 0.7782
Epoch 26/50
90/90 [==============================] - 3s 34ms/step - loss: 0.6360 - accuracy: 0.8021 - val_loss: 0.6818 - val_accuracy: 0.7936
Epoch 27/50
90/90 [==============================] - 3s 33ms/step - loss: 0.6167 - accuracy: 0.8102 - val_loss: 0.7025 - val_accuracy: 0.7858
Epoch 28/50
90/90 [==============================] - 3s 33ms/step - loss: 0.6092 - accuracy: 0.8132 - val_loss: 0.6944 - val_accuracy: 0.7852
Epoch 29/50
90/90 [==============================] - 3s 33ms/step - loss: 0.5967 - accuracy: 0.8159 - val_loss: 0.6689 - val_accuracy: 0.7956
Epoch 30/50
90/90 [==============================] - 3s 33ms/step - loss: 0.5860 - accuracy: 0.8211 - val_loss: 0.6683 - val_accuracy: 0.7958
Epoch 31/50
90/90 [==============================] - 3s 33ms/step - loss: 0.5735 - accuracy: 0.8264 - val_loss: 0.6995 - val_accuracy: 0.7822
Epoch 32/50
90/90 [==============================] - 3s 33ms/step - loss: 0.5723 - accuracy: 0.8239 - val_loss: 0.6809 - val_accuracy: 0.7960
Epoch 33/50
90/90 [==============================] - 3s 33ms/step - loss: 0.5633 - accuracy: 0.8276 - val_loss: 0.6736 - val_accuracy: 0.7920
Epoch 34/50
90/90 [==============================] - 3s 33ms/step - loss: 0.5558 - accuracy: 0.8301 - val_loss: 0.6602 - val_accuracy: 0.8024
Epoch 35/50
90/90 [==============================] - 3s 33ms/step - loss: 0.5552 - accuracy: 0.8317 - val_loss: 0.6673 - val_accuracy: 0.7980
Epoch 36/50
90/90 [==============================] - 3s 33ms/step - loss: 0.5393 - accuracy: 0.8372 - val_loss: 0.6690 - val_accuracy: 0.8022
Epoch 37/50
90/90 [==============================] - 3s 33ms/step - loss: 0.5327 - accuracy: 0.8404 - val_loss: 0.6598 - val_accuracy: 0.8004
Total time:  121.35661888122559 seconds

Evaluate the model

In [15]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 1s 4ms/step - loss: 0.6864 - accuracy: 0.7956
test set accuracy:  0.7955999970436096

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [16]:
history_dict = history.history
history_dict.keys()
Out[16]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [17]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [18]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [19]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [20]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [21]:
plot_confusion_matrix(norm_conf_mx)

Visualize predictions

In [22]:
preds = model.predict(test_images_norm)
preds.shape
Out[22]:
(10000, 10)
In [23]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [24]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[24]:
  airplane automobile bird cat deer dog frog horse ship truck
0 1.72% 0.02% 0.23% 93.11% 0.06% 4.17% 0.09% 0.06% 0.52% 0.02%
1 0.30% 0.34% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.35% 0.01%
2 2.25% 4.50% 0.03% 0.58% 0.04% 0.04% 0.10% 0.03% 90.44% 1.99%
3 95.06% 0.41% 0.35% 1.00% 0.39% 0.00% 0.01% 0.01% 2.72% 0.06%
4 0.00% 0.00% 0.46% 0.14% 23.18% 0.00% 76.21% 0.00% 0.00% 0.00%
5 0.00% 0.00% 0.09% 0.53% 0.07% 0.19% 99.13% 0.00% 0.00% 0.00%
6 0.13% 81.42% 0.07% 1.94% 0.00% 1.55% 0.25% 0.52% 0.07% 14.05%
7 0.09% 0.01% 5.44% 1.62% 1.02% 0.24% 91.45% 0.01% 0.03% 0.07%
8 0.00% 0.00% 0.36% 94.15% 1.06% 2.56% 1.53% 0.33% 0.00% 0.00%
9 0.27% 48.61% 0.02% 0.04% 0.01% 0.00% 0.44% 0.00% 0.35% 50.26%
10 16.81% 0.02% 4.63% 11.12% 3.28% 7.67% 0.14% 5.91% 49.02% 1.40%
11 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
12 0.08% 0.00% 6.54% 31.66% 13.92% 42.35% 1.14% 4.26% 0.04% 0.00%
13 0.00% 0.00% 0.00% 0.00% 0.05% 0.03% 0.00% 99.91% 0.00% 0.00%
14 0.00% 0.03% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 99.95%
15 6.06% 0.25% 13.92% 7.41% 37.03% 0.07% 20.29% 0.03% 14.86% 0.09%
16 0.00% 0.01% 0.14% 16.43% 0.02% 82.17% 0.16% 1.06% 0.01% 0.01%
17 0.34% 0.05% 0.90% 3.82% 0.80% 7.38% 0.51% 80.67% 0.10% 5.44%
18 0.59% 0.04% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 98.70% 0.66%
19 0.00% 0.00% 0.02% 0.01% 0.00% 0.00% 99.96% 0.00% 0.00% 0.00%

Plot TSNE plot

In [25]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-2]
output_layer_activations = activations[-1]
In [26]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.002s...
[t-SNE] Computed neighbors for 5000 samples in 0.768s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 1.096052
[t-SNE] KL divergence after 250 iterations with early exaggeration: 77.584763
[t-SNE] KL divergence after 300 iterations: 2.273604
In [27]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()

Experiment 10

CNN with 4 convolution and 2 max pooling layers with regularization (batch normalization, L2 regulariazation, dropout, early stopping)

Create the Model

Build CNN Model

In [43]:
model = models.Sequential()

model.add(layers.Conv2D(filters=64, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu,input_shape=(32, 32, 3))) 
model.add(layers.Conv2D(filters=108, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D((2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Conv2D(filters=180, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.Conv2D(filters=210, kernel_size=(3, 3), strides=(1, 1), activation=tf.nn.relu))
model.add(layers.MaxPool2D(pool_size=(2, 2),strides=2))
model.add(layers.Dropout(0.3))

model.add(layers.Flatten())
model.add(layers.BatchNormalization())
model.add(layers.Dense(units=260, activation=tf.nn.relu, kernel_regularizer=tf.keras.regularizers.L2(0.001)))
model.add(layers.Dense(units=10, activation=tf.nn.softmax, name="output_layer"))
In [44]:
model.summary()
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_6 (Conv2D)           (None, 30, 30, 64)        1792      
                                                                 
 conv2d_7 (Conv2D)           (None, 28, 28, 108)       62316     
                                                                 
 max_pooling2d_4 (MaxPooling  (None, 14, 14, 108)      0         
 2D)                                                             
                                                                 
 dropout_4 (Dropout)         (None, 14, 14, 108)       0         
                                                                 
 conv2d_8 (Conv2D)           (None, 12, 12, 180)       175140    
                                                                 
 conv2d_9 (Conv2D)           (None, 10, 10, 210)       340410    
                                                                 
 max_pooling2d_5 (MaxPooling  (None, 5, 5, 210)        0         
 2D)                                                             
                                                                 
 dropout_5 (Dropout)         (None, 5, 5, 210)         0         
                                                                 
 flatten_2 (Flatten)         (None, 5250)              0         
                                                                 
 batch_normalization (BatchN  (None, 5250)             21000     
 ormalization)                                                   
                                                                 
 dense_2 (Dense)             (None, 260)               1365260   
                                                                 
 output_layer (Dense)        (None, 10)                2610      
                                                                 
=================================================================
Total params: 1,968,528
Trainable params: 1,958,028
Non-trainable params: 10,500
_________________________________________________________________
In [45]:
keras.utils.plot_model(model, "CIFAR10.png", show_shapes=True) 
Out[45]:

Compiling the model

tf.keras.losses.SparseCategoricalCrossentropy
https://www.tensorflow.org/api_docs/python/tf/keras/losses/SparseCategoricalCrossentropy
In [46]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
              metrics=['accuracy'])

Training the model

In [47]:
start = time.time()

history = model.fit(train_images_norm
                    ,train_labels_split
                    ,epochs=30
                    ,batch_size=500
                    ,validation_data=(valid_images_norm, valid_labels_split)
                   ,callbacks=[tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', patience=3)],
                   )

print("Total time: ", time.time() - start, "seconds")
Epoch 1/30
90/90 [==============================] - 5s 48ms/step - loss: 1.8255 - accuracy: 0.4747 - val_loss: 2.4163 - val_accuracy: 0.4020
Epoch 2/30
90/90 [==============================] - 4s 47ms/step - loss: 1.2198 - accuracy: 0.6459 - val_loss: 2.1539 - val_accuracy: 0.5722
Epoch 3/30
90/90 [==============================] - 4s 48ms/step - loss: 0.9742 - accuracy: 0.7161 - val_loss: 1.8217 - val_accuracy: 0.6324
Epoch 4/30
90/90 [==============================] - 4s 45ms/step - loss: 0.8483 - accuracy: 0.7549 - val_loss: 1.4376 - val_accuracy: 0.7042
Epoch 5/30
90/90 [==============================] - 4s 46ms/step - loss: 0.7587 - accuracy: 0.7849 - val_loss: 1.0312 - val_accuracy: 0.7586
Epoch 6/30
90/90 [==============================] - 4s 46ms/step - loss: 0.6965 - accuracy: 0.8109 - val_loss: 0.8290 - val_accuracy: 0.7820
Epoch 7/30
90/90 [==============================] - 4s 46ms/step - loss: 0.6424 - accuracy: 0.8289 - val_loss: 0.8656 - val_accuracy: 0.7516
Epoch 8/30
90/90 [==============================] - 4s 46ms/step - loss: 0.6096 - accuracy: 0.8438 - val_loss: 0.8117 - val_accuracy: 0.7786
Epoch 9/30
90/90 [==============================] - 4s 46ms/step - loss: 0.5654 - accuracy: 0.8602 - val_loss: 0.7886 - val_accuracy: 0.7906
Epoch 10/30
90/90 [==============================] - 4s 46ms/step - loss: 0.5404 - accuracy: 0.8722 - val_loss: 0.7726 - val_accuracy: 0.8044
Epoch 11/30
90/90 [==============================] - 4s 47ms/step - loss: 0.5185 - accuracy: 0.8788 - val_loss: 0.8563 - val_accuracy: 0.7718
Epoch 12/30
90/90 [==============================] - 4s 46ms/step - loss: 0.4988 - accuracy: 0.8905 - val_loss: 0.8100 - val_accuracy: 0.8010
Epoch 13/30
90/90 [==============================] - 4s 46ms/step - loss: 0.4884 - accuracy: 0.8966 - val_loss: 0.8141 - val_accuracy: 0.8006
Total time:  55.42354464530945 seconds

Evaluate the model

In [48]:
loss, accuracy = model.evaluate(test_images_norm, test_labels)
print('test set accuracy: ', accuracy)
313/313 [==============================] - 1s 3ms/step - loss: 0.8356 - accuracy: 0.7946
test set accuracy:  0.7946000099182129

Plotting Performance Metrics

Matplotlib is used to create 2 plots -- displaying the training and validation loss (resp. accuracy) for each (training) epoch side by side.

In [49]:
history_dict = history.history
history_dict.keys()
Out[49]:
dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])
In [50]:
losses = history.history['loss']
accs = history.history['accuracy']
val_losses = history.history['val_loss']
val_accs = history.history['val_accuracy']
epochs = len(losses)
In [51]:
plt.figure(figsize=(16, 4))
for i, metrics in enumerate(zip([losses, accs], [val_losses, val_accs], ['Loss', 'Accuracy'])):
    plt.subplot(1, 2, i + 1)
    plt.plot(range(epochs), metrics[0], label='Training {}'.format(metrics[2]))
    plt.plot(range(epochs), metrics[1], label='Validation {}'.format(metrics[2]))
    plt.legend()
plt.show()

Confusion matrices

Using sklearn.metrics, visualize the confusion matrix.

In [52]:
pred1= model.predict(test_images_norm)
pred1 = np.argmax(pred1, axis=1)
In [53]:
conf_mx = confusion_matrix(test_labels, pred1)
row_sums = conf_mx.sum(axis=1, keepdims=True)
norm_conf_mx = np.round((conf_mx / row_sums), 2)
In [54]:
plot_confusion_matrix(norm_conf_mx)

Visualize predictions

In [55]:
preds = model.predict(test_images_norm)
preds.shape
Out[55]:
(10000, 10)
In [56]:
cm = sns.light_palette((260, 75, 60), input="husl", as_cmap=True)
In [57]:
df = pd.DataFrame(preds[0:20], columns = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'])
df.style.format("{:.2%}").background_gradient(cmap=cm) 
Out[57]:
  airplane automobile bird cat deer dog frog horse ship truck
0 0.25% 0.05% 0.24% 96.83% 0.00% 1.68% 0.92% 0.00% 0.02% 0.01%
1 2.78% 5.69% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 91.53% 0.00%
2 10.71% 19.45% 0.00% 0.01% 0.00% 0.02% 0.00% 0.00% 69.71% 0.09%
3 96.12% 0.98% 0.11% 0.01% 0.00% 0.00% 0.00% 0.00% 2.75% 0.03%
4 0.00% 0.00% 2.43% 0.07% 9.58% 0.00% 87.91% 0.00% 0.00% 0.00%
5 0.01% 0.00% 0.02% 1.55% 0.32% 7.28% 90.73% 0.06% 0.00% 0.02%
6 9.80% 46.25% 0.31% 13.32% 0.00% 28.46% 0.09% 0.51% 0.00% 1.26%
7 0.24% 0.01% 3.06% 0.03% 0.43% 0.05% 95.86% 0.00% 0.00% 0.32%
8 0.01% 0.00% 0.01% 99.79% 0.13% 0.03% 0.00% 0.03% 0.00% 0.00%
9 0.24% 97.92% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.02% 1.83%
10 79.18% 0.01% 0.36% 7.99% 0.07% 8.07% 0.02% 3.29% 0.93% 0.09%
11 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00%
12 0.00% 0.03% 3.01% 14.85% 0.36% 79.72% 0.38% 1.56% 0.10% 0.00%
13 0.00% 0.00% 0.00% 0.00% 0.00% 0.05% 0.00% 99.95% 0.00% 0.00%
14 0.00% 0.03% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.96%
15 3.57% 0.01% 0.29% 0.08% 0.48% 0.00% 0.33% 0.00% 95.23% 0.00%
16 0.00% 0.00% 0.00% 0.83% 0.00% 99.16% 0.00% 0.01% 0.00% 0.00%
17 0.00% 0.00% 0.07% 0.13% 0.05% 1.53% 0.00% 98.18% 0.00% 0.03%
18 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.99% 0.00%
19 0.00% 0.00% 0.02% 0.00% 0.00% 0.00% 99.98% 0.00% 0.00% 0.00%

Plot TSNE plot

In [58]:
# Extracts the outputs of all layers:
layer_outputs = [layer.output for layer in model.layers]

# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

# Get activation values for the last dense layer
activations = activation_model.predict(valid_images_norm[:5000])
dense_layer_activations = activations[-2]
output_layer_activations = activations[-1]
In [59]:
# Reduce the dimension using T-SNE to visualize i n a scatterplot
tsne = TSNE(n_components=2, verbose=1, perplexity=40, n_iter=300)
tsne_results = tsne.fit_transform(dense_layer_activations)

# Scaling
tsne_results = (tsne_results - tsne_results.min()) / (tsne_results.max() - tsne_results.min())
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:783: FutureWarning: The default initialization in TSNE will change from 'random' to 'pca' in 1.2.
  FutureWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/manifold/_t_sne.py:793: FutureWarning: The default learning rate in TSNE will change from 200.0 to 'auto' in 1.2.
  FutureWarning,
[t-SNE] Computing 121 nearest neighbors...
[t-SNE] Indexed 5000 samples in 0.001s...
[t-SNE] Computed neighbors for 5000 samples in 0.620s...
[t-SNE] Computed conditional probabilities for sample 1000 / 5000
[t-SNE] Computed conditional probabilities for sample 2000 / 5000
[t-SNE] Computed conditional probabilities for sample 3000 / 5000
[t-SNE] Computed conditional probabilities for sample 4000 / 5000
[t-SNE] Computed conditional probabilities for sample 5000 / 5000
[t-SNE] Mean sigma: 4.241502
[t-SNE] KL divergence after 250 iterations with early exaggeration: 81.231979
[t-SNE] KL divergence after 300 iterations: 2.674195
In [60]:
cmap = plt.cm.tab10
plt.figure(figsize=(16,10))
scatter = plt.scatter(tsne_results[:,0],tsne_results[:,1], c=valid_labels_split[:5000], s=10, cmap=cmap)
plt.legend(handles=scatter.legend_elements()[0], labels=class_names)

image_positions = np.array([[1., 1.]])
for index, position in enumerate(tsne_results):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(train_images[index], cmap="binary"),
            position, bboxprops={"lw": 1})
        plt.gca().add_artist(imagebox)
plt.axis("off")
plt.show()